The team:
AI & Data offers a full spectrum of solutions for designing, developing, and operating cutting-edge Data and AI platforms, products, insights, and services. Our offering helps clients innovate, and enhance their data, AI, and analytics capabilities, ensuring they can mature and scale effectively.
AI & Data professionals will work with our clients to:
- Design and modernize large-scale data and analytics platforms including data management, governance and the integration of structured and unstructured data to generate insights leveraging cloud, edge and AI/ML technologies, platforms and methodologies for storage, processing
- Leverage automation, cognitive and AI based solutions to manage data, predict scenarios and prescribe actions
- Drive operational efficiency by managing and upgrading their data ecosystems and application platforms, utilizing analytics expertise and providing As-a-Service models to ensure continuous insights and enhancements.
Qualifications
Must Have Skills/Project Experience/Certifications:
- 6-9 years of experience in design and implementation of migrating an Enterprise legacy system to Big Data Ecosystem for Data Warehousing project.
- Must have excellent knowledge in Apache Spark and Python programming experience
- Deep technical understanding of distributed computing and broader awareness of different Spark version
- Strong UNIX operating system concepts and shell scripting knowledge
- Hands-on experience using Spark & Python
- Deep experience in developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
- Externally certified in one of the cloud services (foundational or advanced)- (AWS, GCP, Azure, Snowflake, Databricks)
- Experience in deployment and operationalizing the code, knowledge of scheduling tools like Airflow, Control-M etc. is preferred
- Experience in creating visualizations in either Tableau, PowerBI, Qlik, Looker or any of the other reporting tools
- Good knowledge of Hadoop, Hive and Cloudera/ Hortonworks Data Platform
- Should have exposure with Jenkins or equivalent CICD tool & Git repository
- Experience handling CDC operations with a huge volume of data
- Should understand and have operating experience with Agile delivery model
- Should have experience in Spark related performance tuning
- Should be well versed with understanding of design documents like HLD, TDD etc
- Should be well versed with Data historical load and overall Framework concepts
- Should have participated in different kinds of testing like Unit Testing, System Testing, User Acceptance Testing, etc
Good to Have Skills/Project Experience/Certifications:
- Exposure to PySpark, Cloudera/ Hortonworks, Hadoop and Hive.
- Exposure to AWS S3/EC2 and Apache Airflow
- Participation in client interactions/meetings is desirable.
- Participation in code-tuning is desirable
Official notification