Senior Site Reliability Engineer (NM+)

Siemens | 2 days ago | Pune

Responsibilities

Incident Management, Game Day coordination,
Create and drive Metric/observability solutions and reviews
Support production readiness reviews
Cross division role model to advance the SRE practice in Siemens
Complete technological control over methods of automation, codifying optional activities, microservice architecture, platform engineering to ensure changes, updates or technical advancements are in place for a product
Ensure the team can provide the design, deployment, automation, and scripting solutions to drive new capabilities, visibility, and efficiency
Simplify highly complex ideas, architectures and concepts to encourage achievable adoption
Collaborate with other technical platforms and partners to engineer automated and integrated solutions between tools, services, teams that increase availability, reliability, and performance
Own and ensure the internal and external SLA’s meet and exceed expectations
Be part of maintaining a 24x7, global, highly available SaaS environment
Participate in an on-call rotation that supports our production infrastructure
Troubleshoot production availability incidents that often span across multiple teams and services
Ensure the SRE team can coordinate production incident post-mortems, and contribute to solutions to prevent problem recurrence; with the goal of automated response to all non-exceptional service conditions
Communicate to business and technical partners on incidents as they occur when they impact system performance or availability at a critical level

Required Knowledge/Skills, Education, and Experience

Bachelor’s Degree or equivalent experience;
Proven experience as a Site Reliability Engineer or equivalent role;
Experience working in a large organization though a SRE transformation where existing applications were adapted to contemporary targets
Proven experience with automation via scripting & API development
Experience with software development in the cloud
Experience with monitoring tools (Datadog, CloudWatch, CloudTrail, Cloudability, or equivalent tools)
Proven experience with containerization, specifically Kubernetes
Experience with Amazon Web Services (AWS) services and Terraform, CloudFormation, Ansible, or equivalent tools

Preferred Knowledge/Skills, Education, and Experience

Desired certifications include: Datadog, Kubernetes, Security, AWS certification
Understanding of ITIL
Deep understanding of SRE and Incident management strategies
Experience with issue/incident tracking tool (ServiceNOW, ServiceDesk, Jira or equivalent tools) and open source tools (Linux, Python, Git, Ansible)
Experience on Enterprise IT environment with distributed environments
Networking concepts, including firewalls, VPN, routing, load balancers, security and DNS
Senior level system administration experience, including troubleshooting, support, mentorship/training, and oversight

Official notification

Join our Telegram group for daily job update

Let's work together

Any question or remark? just write us a message

support@ninotronix.com

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.

Senior Site Reliability Engineer (NM+)

Job description

Let's work together

support@ninotronix.com

Send a message