Principal Software Engineer - SRE (15+)

gehealthcare | 95 days ago | Bengaluru

Key Responsibilities:

Lead Platform Reliability Initiatives: Design and optimize multi-region, highly available cloud architectures using services like container orchestration, compute instances, managed databases, and object storage to achieve SLIs/SLOs and error budgets that exceed 99.99% availability.

Drive Automation and IaC: Build and maintain Infrastructure as Code (IaC) pipelines with tools like CDK, Terraform, or CloudFormation; automate deployments via CI/CD tools and serverless functions to accelerate delivery while minimizing operational overhead.

Reliability, Availability & Resilience: Establish, track and enforce SLIs, SLOs, error budgets. Ensure systems’ availability, latency, and throughput meet targets. Build strategies for redundancy, high availability, multi-AZ / multi-region failover, backups, disaster recovery

Enhance Observability and Monitoring: Implement comprehensive monitoring stacks with cloud-native metrics, open-source monitoring, and visualization tools; define alerting thresholds, conduct root cause analyses (RCAs), and optimize performance for distributed systems including message brokers, caching layers, and relational databases.

Champion Security and Compliance: Enforce cloud best practices for identity and access management, encryption, networking, and policy-as-code with tools like OPA; integrate security into CI/CD pipelines to protect sensitive data in regulated environments.

Innovate on Scalability: Evaluate and implement advanced cloud features like serverless architectures, service meshes, and autoscaling solutions to support growing user demands and reduce latency.

Operational Excellence: Participate and lead incident response for production issues and continuously improve processes to balance feature velocity with system reliability.

Cost & Performance: Monitor and optimize cloud spend, resource usage; rightsizing, discount strategies and waste elimination.

Mentor and Influence: Guide junior engineers through design reviews, incident post-mortems, and adoption of SRE practices; collaborate with stakeholders to shape cloud strategy, cost optimization, and capacity planning for enterprise-scale workloads.

Educational Qualification:

Bachelor's Degree or equivalent in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math)

Technical skills:

15+ years in software engineering, site reliability engineering, or cloud platform roles, with significant exposure to AWS production systems.

Deep hands-on expertise with core cloud services including container orchestration, compute, databases, storage, monitoring, identity management, serverless, and networking.

Expert level skill in Infrastructure as Code: Terraform, CloudFormation, AWS CDK or similar.

Proficiency in programming languages like Python, Go, or Java for automation, scripting, and building tools.

Deep understanding of observability tooling: metrics, logging, distributed tracing, alerting (e.g. CloudWatch, Prometheus, Grafana, ELK, etc.).

Strong experience with incident management: debugging, performance tuning, root cause analysis.

Proven track record of cost optimization in cloud environments.

Security mindset: knowledge of AWS security services, governance, compliance standards.

Proven track record in implementing SRE practices: SLIs/SLOs, error budgets, monitoring/alerting, and incident management.

Strong communication and collaboration abilities to influence without authority and translate technical concepts to non-technical stakeholders

Official notification

Join our Telegram group for daily job update

⚡ Hot Jobs Trending Now

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SDE

Staff Software Eng.

Airbnb | Gurgaon, India

Prod

Platform Engineer

Databricks | Bangalore

Quality Assurance

GitLab | Remote

Security

Cloud Security

Zscaler | Mumbai

Product Designer

Figma | Pune, India

SDE