Where you will make an impact
- Provisioning, managing, scaling, and monitoring SaaS applications hosted in AWS cloud.
- Maintaining 99.95% or higher uptime SLA by advocating for SaaS operations best practices and analysing proposed changes to minimize risk.
- Identifying root causes for outages and implementing solutions so incidents don’t repeat.
- Participate in problem resolution, trouble ticketing, and service requests
- Automating repeatable tasks, with CI/CD tooling, which result in reduction of manual labor and errors.
- Creating self-service tools that our customers (internal/external) could use to remove dependency and reduce time to resolution.
- Continuously optimize AWS resources to reduce cloud spend.
- Help product teams make correct architectural and other technical decisions
- Create and review CI/CD code using best practices
- Our applications run 24X7 – this role participates in an on-call rotation
What will you bring
- 2+ years of experience with Linux systems administration.
- 2+ years hands-on operational experience with Amazon Web Services (AWS)
- Experience with relational database (PostgreSQL preferred).
- Working knowledge of AWS platform and services – specifically, RDS, ECS, EC2, and Lambda
- Scripting/automation or software engineering experience – Python experience preferred
- Exceptional problem-solving of information technology systems
- Experience with DevOps tools - Ansible/Terraform/Jenkins/etc.
- Experience configuring and monitoring SaaS/cloud-based applications and infrastructure, and optimizing performance.
- Experience with ticketing systems like ServiceNow, Jira, or Remedy.
The successful candidate will understand how to operate in a high-availability, 24x7 environment and think critically about engineering solutions with an eye towards scale-out. We solve problems for hundreds or thousands of endpoints, not a handful!
Official notification