Job Title: Lead Big Data Developer
Location: Washington, DC
Duration: Long Term
Job Responsibilities
· Understand complex business requirements
· Design and develop ETL pipeline for collecting, validating and transforming data according to the specification
· Develop automated unit tests, functional tests and performance tests.
· Maintain optimal data pipeline architecture
· Design ETL jobs for optimal execution in AWS cloud environment
· Reduce processing time and cost of ETL workloads
· Lead peer reviews and design/code review meetings
· Provide support for production support operations team
· Implement data quality checks.
· Identify areas where machine learning can be used to identify data anomalies
Experience & Qualifications
· 7+ years of experience in programming language Java or Scala
· 7+ years of experience in ETL projects
· 5+ years of experience in big data projects
· 3+ years of experience with API development (REST API’s)
· Believes in Scrum/Agile, and has deep experience delivering software when working on teams that use Scrum/Agile methodology
· Strong and creative analytical and problem-solving skills
Required Technical Skills & Knowledge
- Strong experience in Java or Scala
- Strong experience in big data technologies like AWS EMR, AWS EKS, Apache Spark
- Strong experience with serverless technologies like AWS Dynamo DB, AWS Lambda
- Strong experience in processing with JSON and csv files
- Must be able to write complex SQL queries
- Experience in performance tuning and optimization
- Familiar with columnar storage formats (ORC, Parquet) and various compression techniques
- Experience in writing Unix shell scripts
- Unit testing using JUnit or ScalaTest
- Git/Maven/Gradle
- Code Reviews
- Experience with CI/CD pipelines
- Agile
The following skills a plus:
- AWS Cloud
- BPM/ AWS Step Functions
- Python scripting
- Performance testing tools like Gatling or JMeter