Senior Site Reliability Engineer (7+)
equifax | 1 days ago | Pune

What you'll need

  • Operations experience in supporting highly scalable systems.

  • Ability to operate in a 24x7 environment encompassing global time zones

  • Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is desired

  • Kubernetes: Design, deploy, and manage Kubernetes clusters in production, optimizing for performance and reliability.

  • Cloud Infrastructure: Build and maintain scalable infrastructure on GCP (or other cloud providers), leveraging automation tools like Terraform.

  • Performance Engineering:

  • Identify and analyze performance bottlenecks in applications and infrastructure.

  • Develop and implement performance optimizations.

  • Observability: Implement comprehensive monitoring and logging solutions to proactively detect and resolve issues.

  • Incident Response: Participate in on-call rotations, troubleshooting and resolving production incidents with a focus on minimizing downtime.

  • Collaboration: Work closely with product development teams to promote reliability best practices and ensure smooth deployments.

  • Manage system(s) uptime across cloud-native (AWS, GCP) and hybrid architectures.

  • Build infrastructure as code (IAC) patterns that meet security and engineering standards using one or more technologies (Terraform, scripting with cloud CLI, and programming with cloud SDK).

  • Build CI/CD pipelines for build, test and deployment of application and cloud architecture patterns, using platform (Jenkins) and cloud-native toolchains.

  • Build automated tooling to deploy service request to push a change into production

  • Solve problems and triage complex distributed architecture service map.

  • Build runbooks that are comprehensive and detailed to manage detect, remediate and restore services.

  • Lead availability blameless postmortem and own the call to action to remediate recurrences.

  • On call for high severity application incidents and improving run books to improve MTTR

  • Participate in a team of first responders 24/7, follow the sun operating model for incident and problem management.

  • Effectively communicate to technical peers and team members in both written and verbal formats.

What experience you need

  • Bachelor degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required

  • 7+ years of experience working with containers (Docker, Kubernetes).

  • 5+ years of experience working with public cloud environments ( GCP preferred)

  • Strong  system administration skills, including automation and orchestration on Linux.

  • Strong Kubernetes knowledge and hands-on production administration skills.

  • Programming experience in one or more languages such as Python, Bash, Java, Go, Groovy or similar languages.

  • Proficient in Identifying and analyzing performance bottlenecks in applications and infrastructure

  • Proficiency with continuous integration and continuous delivery (CI/CD) using tools like Jenkins, Git.

  • 5+ years of experience monitoring infrastructure and application performance.

  • Solid understanding of application design principles and trade-offs.

  • Knowledge of network infrastructure and security basics (DNS, subnets, firewalls, load balancers).

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.