wbd | 18 days ago | Bangalore

Roles & Responsibilities:

You will develop, test, document and implement SRE capabilities to improve reliability and scalability of application services running on a public cloud infrastructure.
You will continuously look for opportunities to automate and develop self-healing solutions, leading to operational efficiency and cost optimization of public cloud infrastructure.
You will contribute to development and enhancement of SRE operations tool kit to enable service teams to execute SRE functions during incidents.
You will partner with service engineering and other operations engineering teams to understand customer user journeys, their impact to business and implement best practices leading to improvement of service reliability and scalability.
You will contribute to high-performance, stability, scalability and support of systems that have been successfully shipped to customers in production.
You will support and troubleshoot production issues by reviewing source code, logs, operational metrics, stack trace etc. to pinpoint a specific problem and then resolve it.

What to Bring:

A software engineer with an experience between 3-5 years having a depth of knowledge in software engineering fundamentals, SDLC, automation and managing applications on public cloud infrastructure.
Good understanding of SRE concepts like SLAs, SLOs, SLIs, error budgets, MTTR, MTTD, etc.
Hands-on experience in deploying and managing applications on Kubernetes, with knowledge of pod and container lifecycle management, service and ingress resource management, and persistent storage solutions.
Knowledge of observability ecosystem including metrics, logging, tracing and tools, such as Prometheus, Grafana, Elastic Stack, Datadog, or New Relic.
Proficiency in at least one programming language (e.g. Python, Bash, Java, Go etc..) with respect to designing, coding, testing, and software delivery.
Effective cross-functional collaboration skills to develop tools for secured, scalable, and reliable systems..
Experience in troubleshooting service/infrastructure issues in production and have used one or more industry standard observability tools.
Familiarity with Agile methodologies and SRE practices
You deliver high-quality results the first time and improve code, documentation, and results with each iteration. Your team trusts your work.

What We Offer:

Official notification

Any question or remark? just write us a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.

Job description