GCP Infrastructure Engineer (5+)
UPS | 1 days ago | Chennai

Cloud Infrastructure & Platform Engineering

  • Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.

  • Deploy and manage containerized workloads using Docker and Kubernetes (GKE).

  • Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.

  • Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.

  • Ensure business continuity through backup, disaster recovery, and multi-region deployments.

Automation & Reliability

  • Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.

  • Adopt GitOps practices (Flux) for infrastructure lifecycle management.

  • Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.

  • Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.

Security, Governance & Compliance

  • Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.

  • Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).

  • Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.

Monitoring, Observability & Cost Optimization

  • Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).

  • Define KPIs to monitor system health, performance, and adoption across AI workloads.

  • Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.

Collaboration & Enablement

  • Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.

  • Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.

  • Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.

 

Required Education

Bachelor’s or master’s degree in computer science, Software Engineering, or a related field.

Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.