Your role and responsibilities
As an Entry-Level Site Reliability Engineer (SRE) at OneIT Lab Engineering Team,
you will join a team dedicated to ensuring the reliability, scalability, and performance of IBM systems
and infrastructure. This plays a critical role in advancing critical IBM Power System development
initiatives, gaining hands-on experience with both physical hardware and software environments.
Your responsibilities will include, but not limited to:
· Assisting in the setup, configuration, and maintenance of IBM Power servers and related infrastructure.
· Supporting software-related reliability initiatives such as automation, monitoring, performance tuning, and system optimization.
· Participating in incident response, diagnostics, and root-cause analysis for both hardware and software issues.
· Collaborating with cross-functional teams to ensure smooth integration between physical systems and application environments.
· Supporting projects related to lab analytics—gathering, analyzing, and interpreting data to help guide better business and operational decisions.
· Contributing to the deployment, scaling, and ongoing maintenance of production and test systems.
· Writing clear, concise documentation for processes, configurations, and troubleshooting steps.
· Learning and applying best practices in systems reliability, observability, and infrastructure operations.
You will be expected to grow into a well-rounded SRE capable of tackling challenges in both the physical data center like environment and the software layer that powers our services. Mentorship and hands-on training will be provided to help you develop the skills to excel in both domains.
Required education
Bachelor's Degree
Preferred education
Bachelor's Degree
Required technical and professional expertise
· Passion for eliminating repetitive manual processes using automation.
· Strong attention to detail and excellent analytical capabilities.
· Excellent troubleshooting, problem solving, and debugging skills.
· Proficiency in programming concepts and frameworks.
· Proficiency in scripting/coding for automation using Python, shell scripting (bash, etc), Ansible, and related tools and languages.
· Familiarity with server operations, virtualization, and related infrastructure concepts.
· Fundamental understanding of computer networks.
· Fundamental understanding of data science/analytics framework.
· An automation mindset, wherever possible, you should use scripting and automation.
· Ability to work independently and as part of a team to achieve the SRE agenda
· Complete project work, both supervised and unsupervised
· Ability to effectively prioritize and execute tasks in high-pressure environment.
· Good Written, oral, and interpersonal communication skills.
Official notificationAny question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.