Site Reliability Engineering Manager

To apply for this job please sign in or enter your email below.

Art of Problem Solving · San Diego, CA

Software Engineering
Education
$154,000 - $187,000 Per Year
Posted 4 weeks ago

Postgres
Security
PHP
SQL
Node.js
Report an Issue

As the Site Reliability Engineering Manager, you’ll play a key role in supporting and scaling the technology that helps us discover, inspire, and train the great problem solvers of the next generation. In this position, you will lead our cloud modernization efforts and maintain existing infrastructure across all of our products and services, supporting a growing user base currently numbering around one million. This position is ideal for a detail-oriented and strategic engineering leader who will set and execute our cloud infrastructure strategy alongside their team of two Site Reliability Engineers. This is a hybrid full-time position based at our headquarters in San Diego, CA.

The Site Reliability Engineering Manager:

  • Manages a team of Site Reliability Engineers, including hiring, evaluating, training, and developing their team members as well building a collaborative and productive team culture. 
  • Owns and maintains company cloud infrastructure strategy and SRE team roadmap.
  • Implements/evaluates reliability metrics for our products and services, and advocates for projects to reduce our exposure to or better understand reliability risks.
  • Runs, evaluates, and improves SRE processes and procedures including task workflow, reviews, launches, etc., including managing regular team responsibilities and leading the maintenance of team documentation.
  • Provides technical expertise by collaborating with stakeholders to make high-level decisions related to their team, providing technical direction to team members, and being a knowledge base of information for their team.
  • Allocates team resources by mapping team members to tasks and projects, helping estimate time for their team members to complete projects, and advocating for engineering resources as needed.
  • Drives continuous improvement in the SRE space and the broader Engineering Department by proposing and advocating for projects that will improve reliability, security, and/or maintainability, improve development workflow, remove operational bottlenecks, or otherwise improve engineering department bandwidth. 
  • Is accountable for the overall risk management and reduction practices and contributes to risk management practices in other engineering teams.
  • Communicates cross-team by being the main point of contact between the SRE team and other engineering teams, and between their team and company stakeholders. Facilitates connections between their team members and other teams, and regularly works with engineering managers, engineering team leads, project managers.
  • Performs all the duties of a Site Reliability Engineer.

The ideal candidate has:

  • Expert-level experience planning, designing, implementing, securing, and monitoring scalable infrastructure for web applications in the AWS ecosystem
  • Experience leading technical strategy and execution in projects
  • Experience deploying and managing Infrastructure-as-Code with Terraform
  • Familiarity with Node.js (preferred) and/or PHP
  • Familiarity with MariaDB, PostgreSQL, Redis, Apache, and nginx or similar technologies.
  • Prior full-stack or backend software engineering experience is preferred
  • Prior people management experience, especially in an SRE or DevOps role, is preferred

Why Join AoPS:

The full salary range for this position is 154k-187k with a 6year-end bonus. Here are some things you can look forward to:

  • Impact: The opportunity to drive the reliability and scalability of our infrastructure, supporting our growing number of customers
  • Culture: Work and collaborate with an organization filled with builders and life-long learners who strive to discover, inspire, and train the great problem solvers of the next generation
  • Flexibility: Casual work environment with a hybrid work week and flexible scheduling
  • Benefits: Multiple options for Medical, Dental and Vision plans   
  • Future Planning: 401K with company match
  • Quality of Life: PTO Plan and supportive leadership that gives you the work-life balance you deserve
  • Ease of Transition: Relocation bonus (if currently located outside of San Diego)

Background Check: 

Please note that employment is contingent on the successful completion of a background check.

About AoPS:

Art of Problem Solving (AoPS) is on a mission to discover, inspire, and train the great problem solvers of the next generation. Since 2003, we have trained hundreds of thousands of the country’s top students, including nearly all the members of the US International Math Olympiad team, through our online school, in-person academies, textbooks, and online learning systems. While our primary focus has been math for most of our history, through the years we have expanded our unique problem solving curriculum into more subjects, such as language arts, science, and computer science.

Related Jobs

Sr Software Engineer - Wonderschool
San Francisco, CA - Posted 4 weeks ago
Senior Software Engineer (Product) - Maven
Remote (USA) - Posted 2 weeks ago
View more open tech jobs in San Diego, CA
Be the first to see new Site Reliability Engineering Manager jobs

Save this search to get an email when new jobs match this search.

Create Email Alert