Architect - Site Reliability Engineer (9+)
pepsicojobs | 15 days ago | Hyderabad

Responsibilities

  • Monitor and Respond: Proactively monitor compute infrastructure health and performance, identify potential issues, and respond quickly to incidents.
  • Automate and Optimize: Develop and implement automation tools to streamline compute operations, improve efficiency, and reduce manual intervention.
  • Collaborate and Troubleshoot: Work closely with software engineering, platform, and other teams to troubleshoot complex compute problems and implement solutions.
  • Capacity Planning: Analyze compute resource usage and trends to forecast capacity needs and ensure sufficient resources are available to meet demand.
  • Document and Communicate: Maintain accurate and up-to-date documentation of compute configurations, procedures, and incidents.
  • Participate in On-Call Rotation: Provide 24/7 on-call support for critical compute incidents.

Qualifications

  • Experience: 9+ years of experience in systems engineering or operations, with a focus on SRE principles and practices.
  • Technical Skills: Deep understanding of operating systems (Linux, Windows), virtualization technologies, Storage and Back Up systems including container orchestration platforms (Kubernetes, Docker).
  • Problem-Solving: Strong analytical and problem-solving skills, with the ability to identify and resolve complex compute issues.
  • Communication: Excellent written and verbal communication skills, with the ability to collaborate effectively with cross-functional teams.  
  • Adaptability: Ability to thrive in a fast-paced, dynamic environment, and adapt to changing priorities.
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.