GCP Infrastructure Engineer (5+)
UPS | 12 days ago | Chennai

Cloud Infrastructure & Platform Engineering

Automation & Reliability

Security, Governance & Compliance

Monitoring, Observability & Cost Optimization

Collaboration & Enablement

    • Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.

    • Deploy and manage containerized workloads using Docker and Kubernetes (GKE).

    • Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.

    • Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.

    • Ensure business continuity through backup, disaster recovery, and multi-region deployments.

    • Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.

    • Adopt GitOps practices (Flux) for infrastructure lifecycle management.

    • Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.

    • Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.

    • Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.

    • Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).

    • Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.

    • Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).

    • Define KPIs to monitor system health, performance, and adoption across AI workloads.

    • Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.

    • Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.

    • Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.

    •  
Official notification
Contact US

Let's work laptop charging together

Any question or remark? just write us a message

Send a message

If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.