Key Responsibilities
Infrastructure Reliability & Performance
• Monitor, maintain, and optimize VMware clusters, ESXi hosts, and Oracle Linux servers
• Ensure high availability and disaster recovery readiness for virtualized environments
• Troubleshoot and resolve incidents impacting virtualization and Linux platforms
Automation & Tooling
• Design and implement automation for patching, configuration management, and routine operational tasks using tools like Chef, Ansible, Jenkins, and Python
• Develop scripts and pipelines to reduce manual effort and improve operational agility
Capacity & Configuration Management
• Manage resource allocation across VMware clusters and Oracle Linux systems
• Implement standardization and compliance for OS configurations and security baselines
Monitoring & Alerting
• Configure and maintain monitoring solutions (e.g., vROps, Splunk, Prometheus) for proactive issue detection
• Optimize alerting thresholds to reduce noise and improve incident response times
Incident & Problem Management
• Lead root cause analysis for critical incidents and implement permanent fixes
• Collaborate with cross-functional teams to resolve complex infrastructure issues
Security & Compliance
• Ensure timely patching of VMware and Oracle Linux environments to address vulnerabilities
• Maintain compliance with enterprise security standards and regulatory requirements
All About You and Required Skills & Qualifications
• BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience
• Bachelor’s degree in information technology, Computer Science or equivalent work experience
• Analytical/problem solving and planning skills
• The ability to organize, multi-task and prioritize work based on current business needs.
• Possess strong communication skills -- both verbal and written
• Strong relationship skills, collaborative skills and customer service skills
• Interest in designing, analysing and troubleshooting large-scale distributed systems
• We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and Engineering teams to prioritize needs and to build relationships is a must
• We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed
• Ability to work with little or no supervision
• Strong experience with VMware vSphere, ESXi, vCenter, and related virtualization technologies.
• Proficiency in Oracle Linux administration, including kernel tuning and patching.
• Hands-on experience with automation tools (Chef, Ansible, Jenkins) and scripting (Python, Bash).
• Familiarity with monitoring and logging tools (vROps, Splunk, Prometheus).
• Knowledge of networking fundamentals, storage (VSAN), and virtualization best practices.
• Experience with incident management, root cause analysis, and performance optimization.
• Understanding of cloud platforms (AWS, Azure) and container technologies (Docker, Kubernetes) is a plus.
Preferred Qualifications
• Certifications: VMware Certified Professional (VCP), Oracle Linux Certified Administrator
• Experience in Site Reliability Engineering principles (SLIs, SLOs, error budgets)
• Strong collaboration and communication skills for cross-team engagement
Any question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.