Site Reliability Engineer (6+)
lilly | 1 days ago | Hyderabad

What You’ll Do

* Automation First: Identify repetitive manual work and design automation frameworks, self-service tooling, and auto-healing systems.

* Observability & Monitoring: Build end-to-end monitoring, logging, and alerting systems to ensure visibility and proactive issue resolution.

* Incident Response: Lead complex incident troubleshooting, root cause analysis, and drive blameless postmortems.

* CI/CD & Infrastructure: Enhance CI/CD pipelines and use Infrastructure as Code (IaC) to provision, configure, and manage cloud resources.

* Collaboration: Partner with dev teams to embed reliability into design and development not just after deployment.

* Innovation: Continuously evaluate emerging tools and technologies, keeping the stack modern and efficient.

* Participate in on-call rotation and improve processes to minimize human intervention.

 

What We’re Looking For

* 6–9 years of hands-on experience as an SRE Engineer.

* Strong expertise in at least one major cloud platform (AWS, Azure, or GCP).

* Deep knowledge of Linux/Unix systems, networking, and distributed systems.

* Proficiency in programming/scripting (Python, Go, or similar).

* Advanced skills with containers and orchestration (Docker, Kubernetes at scale). * Proven experience with CI/CD pipelines and Infrastructure as Code (Terraform, Ansible, Helm, etc.).

* Expertise with observability platforms (Prometheus, Grafana, ELK, Datadog, Splunk).

* Strong background in incident management, disaster recovery, and capacity planning.

* Familiarity with SRE practices (SLIs, SLOs, error budgets, blameless postmortems).

* Excellent problem-solving, debugging, and performance optimization skills.

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.