Site Reliability Engineer (SRE) (6+)
netapp | 5 days ago | Bengaluru

Job Requirements

o    Collaborate with external customers and partners to ensure their success with Google Cloud NetApp Volumes.
o    Respond to, troubleshoot, and drive root cause analysis (RCA) of complex live production incidents, including cross-platform issues involving OS, networking, and databases in cloud-based SaaS/IaaS environments by following and implementing SRE best practices.
o    Continuously monitor, analyze, and measure system health, availability, and latency using tools like Prometheus, Google Cloud Monitoring, ElasticSearch, Grafana, and SolarWinds. Develop and implement steps to improve system and application performance, availability, and reliability.
o    Document system knowledge, create runbooks, and ensure critical system information is readily available.
o    Stay up-to-date with security trends and proactively identify, diagnose, and resolve complex security issues.
o    Maintain and monitor deployment, orchestration of servers, Docker containers, databases, and general backend infrastructure.
o    Automate tasks and system components that would benefit from automation or are performed manually.
o    Utilize Atlassian Jira to track issues to resolution based on their priority.
o    Engage in incident management processes and resolve issues within agreed SLAs/SLOs.

o    Extensive experience in storage technologies and incident management processes.
o    Advanced knowledge of Linux operating systems (e.g., Ubuntu, CentOS).
o    Proficiency in container-based architecture (e.g., Kubernetes).
o    Intermediate to advanced knowledge of automation tools and scripting languages such as Ansible, Python, Bash, Go, and PowerShell.
o    Solid understanding of algorithms, data structures, and databases (SQL/NoSQL).
o    Intermediate knowledge of networking concepts.
o    Hands-on experience with cloud environments, particularly GCP.
o    Exceptional debugging skills across various platforms and technologies.
o    Familiarity with site reliability engineering principles and best practices.

Education

BE in Computer Science or a related field, or 6+ years of professional experience in a relevant role. 

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.