Cloud Infrastructure & Platform Engineering
Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
Adopt GitOps practices (Flux) for infrastructure lifecycle management.
Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
Define KPIs to monitor system health, performance, and adoption across AI workloads.
Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor’s or master’s degree in computer science, Software Engineering, or a related field.
Official notificationAny question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.