Lead the design, deployment, automation, and integration of scripting solutions to enhance capabilities, visibility, and efficiency.
Collaborate with leaders across technical platforms and partners to engineer automated, integrated solutions that improve tool, service, and team interactions, increasing availability, reliability, and performance.
Oversee and ensure that both internal and external SLAs consistently meet or exceed expectations.
Continuously review and refine SRE standards, processes, and standard practices, particularly in incident response and toil reduction.
Manage a team of engineers participating in a 24/7 on-call rotation to support our production infrastructure.
Join incident calls that exceed acceptable duration.
Ensure comprehensive post-mortem analysis of production incidents, driving continuous improvement initiatives.
Required Knowledge/Skills, Education, and Experience
7+ years of professional experience in SRE or DevOps, with 3+ years of experience in a leadership role.
proven experience with automation via scripting & API development
2+ years experience with observability tools(Datadog, CloudWatch, Cloud-Trail, Elastic Stack, Grafana, or equivalent tools)
2+ years experience with containerization, specifically Kubernetes
2+ years experience with Amazon Web Services (AWS) services
2+ years experience Terraform, CloudFormation, Ansible, or equivalent tools
2+ years experience with issue/incident tracking tool
Preferred Knowledge/Skills, Education, and Experience
Familiarity with agile methodologies and experience working in an Agile/Scrum environment.
Desired certifications include: Datadog, Kubernetes, AWS or Azure certification
2+ years experience as a Site Reliability Engineer or equivalent role (ServiceNOW, ServiceDesk, Jira or equivalent tools)
2+ years with log management tools (ie ELK Stack)
2+ years experience Enterprise IT environment with distributed environments
Senior level system administration experience, including troubleshooting, support, mentorship/training, and oversight