Prime Responsibilities:
- Regularly examine multiple monitoring systems for unexpected deviations in any of application layers.
- React to alerts with well-defined procedures, escalate problems to the appropriate people, follow up till resolution and finally incident reporting.
- Setup/Monitor alerts on OPS tools and monitoring applications like Zabbix, Grafana, ELK stack.
- Create shell/Python script-based reports & CRON scheduling to support periodic reports.
- Adhere to defined process and be ready for some adhoc and surprise incidents
- Help your coworkers by creating documentation and detailed knowledge sharing for continuous improvement.
- Communications skills and clearness in reporting and communication.
- Troubleshooting Live site production issue by co-relating different components.
- Day-to-day maintenance of the application systems in operation, including tasks related to identifying and troubleshooting application issues and issues resolution or escalation.
Desired Skills:
- 3-6 years of relevant experience in 24x7 AWS Cloud based Linux production environment.
- Ability to monitor diverse architecture, troubleshoot problems, analyze impact and escalation
- Willing to work in precise schedules, night shifts & weekends to support our 24x7 systems on rotational basis.
- Basic Linux command skills is must & experience in any scripting language (Shell/Python) is plus.
- Basic Knowledge of Web/Internet concepts i.e. DNS, Common Protocols, Ports, Cookies, Firebug.
- Hands on experience in L2 debugging like finding errors/exceptions in logs.
- Basic Knowledge of SQL queries
- Work well in a busy team, being quick to learn and able to deal with a wide range of issues
- Prior experience in ELK, Zabbix or Grafana would be added advantage.
- Knowledge of AWS Cloud environment is huge plus
Official notificationAny question or remark? just write us a message
If you would like to discuss anything related to payment, account, licensing,
partnerships, or have pre-sales questions, you’re at the right place.