Principal Site Reliability Engineer (NM+)

lilly | 138 days ago | Hyderabad

What You’ll Be Doing

Lead the SRE team responsible for the reliability and performance of applications deployed on a cloud-native internal platform.

Design, implement, and maintain automation frameworks, self-service tooling, and auto-healing systems to eliminate manual toil.

Build and enhance end-to-end observability, monitoring, logging, and alerting systems for proactive issue detection and resolution.

Ensure Uptime: Take ultimate ownership of our production environment's stability. Lead end-to-end incident management, from escalation to Root Cause Analysis (RCA). Manage patching, upgrades, and disaster recovery processes.

Champion Infrastructure as Code (IaC) and CI/CD best practices to ensure consistent, repeatable, and secure deployments.

Collaborate with development and product teams to embed reliability and scalability into application design and architecture.

Continuously evaluate and introduce emerging tools and technologies to keep the SRE stack modern and efficient.

Mentor and guide SRE engineers, fostering a culture of ownership, innovation, and continuous improvement.

Implement AIOps frameworks to improve operational tasks and enhance system self-healing capabilities.

Participate in and optimise the on-call rotation, striving to minimise human intervention through automation.

Drive capacity planning, disaster recovery, and business continuity initiatives.

Support onboarding, documentation, and knowledge sharing for platform services and operational best practices.

Official notification

Join our Telegram group for daily job update

⚡ Hot Jobs Trending Now

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SDE

Staff Software Eng.

Airbnb | Gurgaon, India

Prod

Platform Engineer

Databricks | Bangalore

Quality Assurance

GitLab | Remote

Security

Cloud Security

Zscaler | Mumbai

Product Designer

Figma | Pune, India

SDE