Site Reliability Engineer (3+)
thomsonreuters | 21 days ago | Hyderabad

In this opportunity as Site Reliability Engineer, you will:

  • Work with application teams to manage and support applications into production
  • Continuous improvement to an on-going support model including release and change management for maintaining the strategic environments (i.e. production, non-production etc.)
  • Provide well-written documentation and technical presentations on projects supported by the team.
  • Provide problem management services by utilizing diagnostic and debugging tools to aid in troubleshooting efforts, including 24x7 rotating pager support.
  • Coordinate the implementation of application monitoring, establish support documentation, and provide training on products and procedures.
  • Provide technical assistance on the troubleshooting, and performance tuning of the supported environment(s)

 

About You
You're a fit for the role of Site Reliability Engineer if your background includes

  • 3-5 years of experience in an enterprise-level operations support role, SRE, or DevOps role.
  • Working knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks)
  • Expertise in observability and monitoring tools, like Datadog, AppDynamics, Splunk, etc.
  • Deep understanding of Application performance monitoring (APM) and user monitoring.
  • Knowledge of Infrastructure as Code (IaC): AWS Cloud Formation, Ansible, Terraform, etc. Apply standards of cloud compliance to application design to achieve reliability
  • Experience in site reliability engineering in Dotnet, Java, Kubernetes, and Database platforms (like Postgres)
  • Experience with Load balancers and AWS services such as AWS ECS, EMR, State Machines/ Step Functions, CloudFormation, CloudWatch, Lambda, SQS, ECR, Fargate, Elastic Search, networking concepts, etc.,
  • Sound knowledge of ITSM process, SI/SLO/SLA management, incident resolution, and automation techniques
  • Strong IP networking fundamentals and experience with usage of standard application protocols and messages (e.g., TCP/IP, HTTP, SOAP, RESTful APIs, XML/JSON, JDBC, JMS/MQ)
  • Ability to analyze application and server logs, error interpretation.
  • Incident response and recovery: SREs are responsible for responding to incidents and implementing processes for incident response, monitoring, and automated recovery.
  • Scripting knowledge in Poweshell, Bash, shell scripting
  • Ability to code in one of the programming languages (Java, C#, Python, JavaScript, etc.)
  • Working knowledge of ITIL Change and Incident management processes.
  • Excellent written and verbal communication skills and strong collaboration skills.
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.