Senior Site Reliability Engineer (NM+)
Siemens | 2 days ago | Pune

Responsibilities

 

  • Incident Management, Game Day coordination,
  • Create and drive Metric/observability solutions and reviews
  • Support production readiness reviews
  • Cross division role model to advance the SRE practice in Siemens
  • Complete technological control over methods of automation, codifying optional activities, microservice architecture, platform engineering to ensure changes, updates or technical advancements are in place for a product
  • Ensure the team can provide the design, deployment, automation, and scripting solutions to drive new capabilities, visibility, and efficiency
  • Simplify highly complex ideas, architectures and concepts to encourage achievable adoption
  • Collaborate with other technical platforms and partners to engineer automated and integrated solutions between tools, services, teams that increase availability, reliability, and performance
  • Own and ensure the internal and external SLA’s meet and exceed expectations
  • Be part of maintaining a 24x7, global, highly available SaaS environment
  • Participate in an on-call rotation that supports our production infrastructure
  • Troubleshoot production availability incidents that often span across multiple teams and services
  • Ensure the SRE team can coordinate production incident post-mortems, and contribute to solutions to prevent problem recurrence; with the goal of automated response to all non-exceptional service conditions
  • Communicate to business and technical partners on incidents as they occur when they impact system performance or availability at a critical level

 

Required Knowledge/Skills, Education, and Experience

 

  • Bachelor’s Degree or equivalent experience;

  • Proven experience as a Site Reliability Engineer or equivalent role;

  • Experience working in a large organization though a SRE transformation where existing applications were adapted to contemporary targets

  • Proven experience with automation via scripting & API development

  • Experience with software development in the cloud

  • Experience with monitoring tools  (Datadog, CloudWatch, CloudTrail, Cloudability, or equivalent tools)

  • Proven experience with containerization, specifically Kubernetes

  • Experience with Amazon Web Services (AWS) services and Terraform, CloudFormation, Ansible, or equivalent tools

 

 

Preferred Knowledge/Skills, Education, and Experience

 

  • Desired certifications include: Datadog, Kubernetes, Security, AWS certification

  • Understanding of ITIL

  • Deep understanding of SRE and Incident management strategies

  • Experience with issue/incident tracking tool (ServiceNOW, ServiceDesk, Jira or equivalent tools) and open source tools (Linux, Python, Git, Ansible)

  • Experience on Enterprise IT environment with distributed environments

  • Networking concepts, including firewalls, VPN, routing, load balancers, security and DNS

  • Senior level system administration experience, including troubleshooting, support, mentorship/training, and oversight

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.