- Understand requirements/use cases and build efficient ETL solutions using Apache Spark, python, Kafka, Hive targeting Cloudera Data Platform.
- Requirement/use case analysis and convert functional requirements into concrete technical tasks and able to provide reasonable effort estimates.
- Work closely with Data analyst/modeler, Business User to understand the data requirement.
- Convert requirements to high-level, low-level design and, source-to- target documents.
- Responsible to design, develop and schedule data pipelines which handle large volume of data within SLA.
- Work with solution architect, Technical Managers, Admins to understand SLAs, limitations of systems and provide efficient solutions.
- Expertise in processing large volume of data aggregation using spark, must know different performance improvement technique and should lead teams on optimization.
- Responsible to develop efficient data ingestion and data governance framework as per specification.
- Performance improvement of existing spark-based data ingestion, aggregation pipelines to meet SLA.
- Work proactively, independently with global teams to address project requirements, articulate issues/challenges with enough lead time to address project delivery risks.
- Plan production implementation activities , execute change requests and resolve issues in production implementation.
- Plan and execute large data migration, history data rebuild activities.
- Code reviews/optimization, test case reviews . Demonstrate troubleshooting skill in resolving technical issues, bugs.
- Demonstrate ownership and initiative. Ability to bring-in best practices /solutions which best fit for client problem and environment.
Salary Range - $100,000-$130,000 a year