Observability Engineer (NM+)

Deloitte | 114 days ago | Hyderabad

Work you’ll do:

 Lead discussions with business and functional analysts to understand requirements and assess integration impacts on business architecture. Prepare technical architecture and design, clarify requirements, and resolve ambiguities. Develop solutions in line with established technical designs, integration standards, and quality processes. Create and enforce technical design and development guides, templates, and standards. Facilitate daily scrum meetings, manage deliverables, and prepare weekly status reports for leadership review. Conduct detailed deliverable reviews and provide technical guidance to team members. Collaborate with onsite clients, coordinators, analysts, and cross-functional teams. Design templates or scripts to automate routine development or operational tasks.

Observability & Monitoring  

Implement monitoring and dashboarding solutions using Dynatrace, ensuring real-time visibility into applications, infrastructure, and services. 
Set up log monitoring using Dynatrace Grail, ensuring comprehensive log analysis and correlation. 
Define and configure custom and default alerts in Dynatrace to detect anomalies and system issues proactively. 
Develop static and dynamic alerting mechanisms to minimize noise while ensuring prompt incident detection. 
Integrate monitoring solutions with BigPanda, enabling AI-driven event correlation and incident response.

 Incident & Service Management 

Establish robust incident management workflows, ensuring seamless detection, triaging, and resolution of issues. 
Work with ServiceNow for ITSM integration, ensuring structured incident tracking and resolution. 
Configure Opsgenie for on-call rotations, escalations, and automated incident notifications. 
Drive post-incident reviews to identify root causes, implement corrective actions, and improve system resilience.

Cloud & Infrastructure Automation  

Deploy, manage, and scale AWS services, including Lambda, S3, RDS, DynamoDB, EKS, IAM, Security Groups, VPC, and Route 53. 
Automate infrastructure provisioning using Terraform and ensure best practices in Infrastructure as Code (IaC). 
Implement GitHub Actions and Harness CI/CD pipelines for automating deployments and ensuring release stability. 
Utilize Docker and Kubernetes for containerized workloads and manage service mesh solutions for traffic routing and security.

 Site Reliability & Performance Optimization  

Define SLOs (Service Level Objectives), SLIs (Service Level Indicators), and error budgets to enhance service reliability. 
Analyze performance bottlenecks using Dynatrace and suggest optimizations to improve system efficiency. 
Implement self-healing mechanisms, auto-scaling policies, and failover strategies to enhance system resilience. 
Champion best practices for cloud security, cost optimization, and governance across AWS environments.

Official notification

Join our Telegram group for daily job update

⚡ Hot Jobs Trending Now

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SRE

Sr. SRE Engineer

Stripe | Bangalore, India

DEV

Backend Developer

Coinbase | Remote, India

Infra

Cloud Infra Lead

Datadog | Pune, India

MLOps Architect

Anthropic | Hyderabad

Data

Fivetran Data Eng.

Fivetran | Mumbai

SDE

Staff Software Eng.

Airbnb | Gurgaon, India

Prod

Platform Engineer

Databricks | Bangalore

Quality Assurance

GitLab | Remote

Security

Cloud Security

Zscaler | Mumbai

Product Designer

Figma | Pune, India

SDE