Site Reliability Engineer (4+)
Infosys | 1 days ago | BANGALORE

Roles and Responsibilities: • Design and implement the lifecycle of services from conception to inception, including system design, build, and deployment • Develop software solutions to enable operability of large-scale distributed systems capable of handling millions of transactions and petabytes of data • Manage capacity and performance to help scale the infrastructure both on public and private clouds around the world • Define and implement standards and best practices related to: System Architecture, Deployment, metrics, operational tasks • Support services through activities such as monitoring availability, system health, and incident response • Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis • Engage in Communications across all areas of the organization • Troubleshooting and monitoring production systems to ensure the highest uptimes are maintained • Support and improve upon existing high-availability architecture solutions as well as manage the operational activity. • Integrate Generative AI (GenAI) and AIOps tools to automate incident detection, root cause analysis, and resolution workflows (e.g., self-healing scripts, intelligent runbooks), reducing manual toil and accelerating response times. • Apply Prompt Engineering techniques to enhance interactions with AI-based observability and automation platforms improving accuracy and efficiency of AI responses. • Leverage platform-specific AI capabilities (e.g., AWS Bedrock, Azure OpenAI, GCP Vertex AI) to architect intelligent SRE solutions tailored to cloud environments. • Design, implement, and maintain AI/ML driven monitoring and alerting systems to proactively detect anomalies and predict potential failures, enabling preemptive remediation. • Develop and train machine learning models using operational telemetry (logs, metrics, events, traces) to support predictive analytics and intelligent automation. • Evaluate and deploy AIOps platforms (e.g., Moogsoft, Dynatrace, Splunk, BigPanda, Datadog, Elastic) to enhance observability, reduce noise, and accelerate incident resolution. • Experience in one or more high level programming languages like Python or Ruby or GoLang and familiar with Object Oriented Programming.

Official notification

⚡ Hot Jobs Trending Now

SRE
Sr. SRE Engineer
Stripe | Bangalore, India
DEV
Backend Developer
Coinbase | Remote, India
Infra
Cloud Infra Lead
Datadog | Pune, India
ML
MLOps Architect
Anthropic | Hyderabad
Data
Fivetran Data Eng.
Fivetran | Mumbai
SRE
Sr. SRE Engineer
Stripe | Bangalore, India
DEV
Backend Developer
Coinbase | Remote, India
Infra
Cloud Infra Lead
Datadog | Pune, India
ML
MLOps Architect
Anthropic | Hyderabad
Data
Fivetran Data Eng.
Fivetran | Mumbai
SDE
Staff Software Eng.
Airbnb | Gurgaon, India
Prod
Platform Engineer
Databricks | Bangalore
QA
Quality Assurance
GitLab | Remote
Security
Cloud Security
Zscaler | Mumbai
UX
Product Designer
Figma | Pune, India
SDE
Staff Software Eng.
Airbnb | Gurgaon, India
Prod
Platform Engineer
Databricks | Bangalore
QA
Quality Assurance
GitLab | Remote
Security
Cloud Security
Zscaler | Mumbai
UX
Product Designer
Figma | Pune, India
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.