• Proven experience (4+ years) in software engineering or reliability roles, with expertise in defining and measuring SLOs, SLIs, and error budgets
• Strong ability to design scalable, reliable systems and evaluate architectural choices for latency and performance
• Proficient in building automation tools and services using Python, Go, or similar languages, including self-healing and auto-scaling capabilities
• Skilled in maintaining high-quality documentation and runbooks through code generation and automation
• Advanced competency in monitoring system performance, optimizing resource usage, and improving detection and recovery times using metrics, tracing, and logs
Official notificationAny question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.