Site Reliability Engineer (3+)
citi | 1 days ago | Chennai

We are seeking a highly motivated and skilled Site Reliability Engineer (SRE) to join our team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our production systems. You will work closely with development and operations teams to automate tasks, improve monitoring and observability, and drive continuous improvement in our infrastructure and processes. This role offers a unique opportunity to combine software engineering expertise with operational focus, making a direct impact on the availability and performance of our services.

Responsibilities:

Automate repetitive tasks in the production environment using scripting languages like Python and configuration management tools to improve efficiency and reduce manual effort. Quantify the impact of automation efforts on process improvement and man-hour savings.

Develop and maintain monitoring and observability tools, integrating production applications with platforms like Splunk, ELK, AppDynamics, Evolven, or ITRS. Configure alerts and dashboards to proactively identify and address potential issues, ensuring comprehensive system visibility.

Collaborate with development and operations teams to identify opportunities for automation and reduction of manual tasks, driving a culture of automation and continuous improvement.

Conduct thorough root cause analysis of production incidents, identifying patterns and suggesting solutions for permanent or temporary fixes. Proactively identify potential issues and implement preventative measures.

Champion SRE best practices within the organization, advocating for improvements in monitoring, alerting, automation, and incident response processes.

Continuously learn and stay up-to-date with the latest technologies and trends in SRE.

Qualifications:

BSC/BE/ME/MSC/MCA in Computer Science, a related field, or equivalent practical experience.

3+ years of experience in a Site Reliability Engineering hands on role with  demonstrated hands-on experience and accountability for delivering results.

Proficiency in Python or Angular.

Experience with at least one database technology: MongoDB, Oracle, or other RDBMS.

Experience with monitoring and observability tools such as Splunk, ELK, AppDynamics, Evolven, ITRS, or similar platforms. Proven ability to configure alerts, dashboards, and onboard applications.

Strong scripting and automation skills, with a proven track record of automating tasks in a production environment.

Excellent problem-solving, analytical, and debugging skills, with a keen eye for identifying patterns and root causes.

Excellent communication and collaboration skills, with the ability to effectively challenge existing practices and advocate for improvements.

Good To have

Experience with version control systems like GitHub or Bitbucket.

Experience with containerization technologies like Docker, OpenShift or Kubernetes.

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.