Principal Software Engineer - SRE (15+)
gehealthcare | 4 days ago | Bengaluru

Key Responsibilities: 

  • Lead Platform Reliability Initiatives: Design and optimize multi-region, highly available cloud architectures using services like container orchestration, compute instances, managed databases, and object storage to achieve SLIs/SLOs and error budgets that exceed 99.99% availability. 

  • Drive Automation and IaC: Build and maintain Infrastructure as Code (IaC) pipelines with tools like CDK, Terraform, or CloudFormation; automate deployments via CI/CD tools and serverless functions to accelerate delivery while minimizing operational overhead. 

  • Reliability, Availability & Resilience: Establish, track and enforce SLIs, SLOs, error budgets. Ensure systems’ availability, latency, and throughput meet targets. Build strategies for redundancy, high availability, multi-AZ / multi-region failover, backups, disaster recovery 

  • Enhance Observability and Monitoring: Implement comprehensive monitoring stacks with cloud-native metrics, open-source monitoring, and visualization tools; define alerting thresholds, conduct root cause analyses (RCAs), and optimize performance for distributed systems including message brokers, caching layers, and relational databases. 

  • Champion Security and Compliance: Enforce cloud best practices for identity and access management, encryption, networking, and policy-as-code with tools like OPA; integrate security into CI/CD pipelines to protect sensitive data in regulated environments. 

  • Innovate on Scalability: Evaluate and implement advanced cloud features like serverless architectures, service meshes, and autoscaling solutions to support growing user demands and reduce latency. 

  • Operational Excellence: Participate and lead incident response for production issues and continuously improve processes to balance feature velocity with system reliability. 

  • Cost & Performance: Monitor and optimize cloud spend, resource usage; rightsizing, discount strategies and waste elimination.  

  • Mentor and Influence: Guide junior engineers through design reviews, incident post-mortems, and adoption of SRE practices; collaborate with stakeholders to shape cloud strategy, cost optimization, and capacity planning for enterprise-scale workloads. 

 

Educational Qualification:

  • Bachelor's Degree or equivalent in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math)

 

Technical skills:

  • 15+ years in software engineering, site reliability engineering, or cloud platform roles, with significant exposure to AWS production systems. 

  • Deep hands-on expertise with core cloud services including container orchestration, compute, databases, storage, monitoring, identity management, serverless, and networking. 

  • Expert level skill in Infrastructure as Code: Terraform, CloudFormation, AWS CDK or similar. 

  • Proficiency in programming languages like Python, Go, or Java for automation, scripting, and building tools. 

  • Deep understanding of observability tooling: metrics, logging, distributed tracing, alerting (e.g. CloudWatch, Prometheus, Grafana, ELK, etc.). 

  • Strong experience with incident management: debugging, performance tuning, root cause analysis. 

  • Proven track record of cost optimization in cloud environments. 

  • Security mindset: knowledge of AWS security services, governance, compliance standards. 

  • Proven track record in implementing SRE practices: SLIs/SLOs, error budgets, monitoring/alerting, and incident management. 

  • Strong communication and collaboration abilities to influence without authority and translate technical concepts to non-technical stakeholders 

 

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.