Site Reliability Engineer 5 (12+)

adobe | 93 days ago | Bangalore

System Architecture & Technical Strategy

Define and drive the long-term reliability and scalability strategy for the Adobe Pass platform, aligning with product and business goals.
Architect large-scale, distributed, and multi-region systems designed for resiliency, observability, and self-healing.
Anticipate systemic risks and design proactive mitigation strategies — ensuring zero single points of failure across critical services.
Partner with software architecture and infrastructure teams to evolve the platform toward greater reliability, efficiency, and cost optimization.

Automation, Observability & Reliability Engineering

Build and champion advanced automation frameworks that enable zero-touch operations across deployment, recovery, and scaling workflows.
Introduce AI/ML-based predictive monitoring and anomaly detection systems to anticipate failures before they impact users.
Lead organization-wide reliability initiatives — such as chaos engineering, error budgets, and SLO adoption — driving measurable reliability improvements.
Continuously refine observability architecture (metrics, traces, logs) to ensure comprehensive, actionable insights into production health.

Incident Response & Operational Excellence

Serve as a technical authority during high-impact incidents, guiding cross-functional teams through real-time mitigation and long-term prevention.
Establish and enforce best-in-class incident management frameworks, improving MTTR, MTBF, and reducing incident recurrence rates.
Lead blameless postmortems and translate findings into actionable reliability roadmaps.
Drive reliability reviews and operational readiness assessments for all major product launches.

Performance, Scalability & Cost Efficiency

Lead large-scale performance tuning and capacity engineering efforts, ensuring optimal resource utilization and cost efficiency across environments.
Identify architectural bottlenecks, drive performance benchmarking, and influence platform evolution for better scalability and elasticity.
Partner with FinOps and CloudOps to optimize spend while maintaining reliability SLAs and SLOs.

Cross-Team Leadership & Mentorship

Mentor and coach SREs and software engineers, cultivating deep reliability-first thinking across teams.
Serve as a thought leader in reliability engineering — driving best practices, evangelizing automation-first culture, and influencing technical standards across multiple teams.
Collaborate with engineering leaders, PMs, and operations to align priorities, set strategic goals, and deliver on high-impact reliability initiatives.
Lead technical deep dives and design reviews, ensuring all systems are built to scale securely and reliably.

Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
12+ years of experience in site reliability, production engineering, or large-scale distributed system operations.
Proven track record of designing and managing highly available, globally distributed systems in cloud-native environments (AWS, Azure, GCP).
Expert-level proficiency in one or more programming/scripting languages (Python, Go, Java, Bash) for automation and tooling.
Deep understanding of Kubernetes, microservices, and service mesh architectures.
Advanced experience with Infrastructure as Code (Terraform, CloudFormation) and CI/CD automation frameworks.
Mastery in observability and monitoring stacks (Prometheus, Grafana, Datadog, OpenTelemetry).
Strong expertise in networking, storage, and distributed databases (both SQL and NoSQL).
Demonstrated ability to influence architectural decisions and drive reliability strategy across organizations.
Exceptional communication, leadership, and stakeholder management skills.

Official notification

Join our Telegram group for daily job update

⚡ Hot Jobs Trending Now

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SDE

Staff Software Eng.

Airbnb | Gurgaon, India

Prod

Platform Engineer

Databricks | Bangalore

Quality Assurance

GitLab | Remote

Security

Cloud Security

Zscaler | Mumbai

Product Designer

Figma | Pune, India

SDE