GCP Infrastructure Engineer (5+)
UPS | 10 days ago | CHENNAI

Cloud Infrastructure & Platform Engineering

  • Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.

  • Deploy and manage containerized workloads using Docker and Kubernetes (GKE).

  • Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.

  • Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.

  • Ensure business continuity through backup, disaster recovery, and multi-region deployments.

Automation & Reliability

  • Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.

  • Adopt GitOps practices (Flux) for infrastructure lifecycle management.

  • Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.

  • Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.

Security, Governance & Compliance

  • Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.

  • Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).

  • Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.

Monitoring, Observability & Cost Optimization

  • Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).

  • Define KPIs to monitor system health, performance, and adoption across AI workloads.

  • Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.

Collaboration & Enablement

  • Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.

  • Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.

  • Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.

 

Required Education

Bachelor’s or master’s degree in computer science, Software Engineering, or a related field.

 

Required Experience

  • 5+ years of experience in cloud infrastructure engineering, DevOps, or platform engineering.

  • Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).

  • Strong hands-on expertise with Google Cloud Platform (GCP), especially Vertex AI.

  • Experience with IBM Watsonx for AI application deployment and management.

  • Proven skills in Docker, Kubernetes (GKE), and container orchestration at scale.

  • Proficiency in Python, Bash, or other relevant scripting languages.

  • Strong understanding of cloud networking, IAM, and security best practices.

  • Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).

  • Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).

  • Excellent problem-solving, debugging, and communication skills.

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.