We are looking for a **Cluster Systems Engineer** to support the deployment, configuration, and maintenance of our AI inference clusters. This role focuses on automation, OS provisioning, and cluster reliability.
##### **Key Responsibilities**
- Provision and manage Linux-based servers in datacenter environments.
- Implement OS lifecycle management and patching strategies.
- Develop automation scripts and playbooks using Python and Ansible.
- Integrate RedFish for Out-of-Band operations.
- Manage and troubleshoot **Kubernetes and Slurm clusters**.
- Contribute to observability solutions using Prometheus and OpenTelemetry.
##### **Required Qualifications**
- Bachelor’s or Master’s in Computer Science, Electrical Engineering, or related field.
- 3–5 years of experience in systems engineering or HPC environments.
- Strong Linux administration experience.
- Good understanding of datacenter networking fundamentals.
- Experience with Redfish, IPMI, SNMP, and other hardware management protocols.
- Proficiency in **Python and Shell scripting**.
- Experience with automation tools (**Ansible preferred**).
- **Hands-on experience managing Kubernetes and Slurm clusters**.
- Solid software engineering fundamentals (version control, testing).
- Exposure to cloud platforms (AWS, Azure, GCP) and hybrid deployments.
- Familiarity with AI workloads is desirable.
Bachelor’s or Master’s in Computer Science, Electrical Engineering, or related field.
- 3–5 years of experience in systems engineering or HPC environments.
- Strong Linux administration experience.
- Good understanding of datacenter networking fundamentals.
- Experience with Redfish, IPMI, SNMP, and other hardware management protocols.
- Proficiency in **Python and Shell scripting**.
- Experience with automation tools (**Ansible preferred**).
- **Hands-on experience managing Kubernetes and Slurm clusters**.
- Solid software engineering fundamentals (version control, testing).
- Exposure to cloud platforms (AWS, Azure, GCP) and hybrid deployments.
- Familiarity with AI workloads is desirable.
Bachelor’s or Master’s in Computer Science, Electrical Engineering, or related field.
- 3–5 years of experience in systems engineering or HPC environments.
- Strong Linux administration experience.
- Good understanding of datacenter networking fundamentals.
- Experience with Redfish, IPMI, SNMP, and other hardware management protocols.
- Proficiency in **Python and Shell scripting**.
- Experience with automation tools (**Ansible preferred**).
Minimum Qualifications:
• Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience.
OR
Master's degree in Engineering, Information Systems, Computer Science, or related field and 1+ year of Software Engineering or related work experience.
OR
PhD in Engineering, Information Systems, Computer Science, or related field.
• 2+ years of academic or work experience with Programming Language such as C, C++, Java, Python, etc.
Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
Official notificationAny question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.