Data Engineer
New Jersey
Our client is a global leader in consumer products, specializing in the production, distribution, and provision of household, health care, and personal care products. With a rich history dating back over 200 years, they are committed to improving the quality of life for consumers worldwide. They are seeking a highly skilled Data Engineer to join their dynamic team and contribute to our mission of sustainability and innovation.
Responsibilities:
- Design, architect, and implement a comprehensive data lake solution, integrating a variety of different types of data sources.
- Develop and maintain Directed Acyclic Graphs (DAGs) using Apache Airflow for efficient data workflows.
- Extract, transform, and load (ETL) data from various sources, including REST APIs, AWS S3 buckets, GCP BigQuery, and other on-prem/cloud databases.
- Manage and optimize Snowflake databases to ensure high performance and reliability.
- Solution data streams for both structured datasets (e.g., clinical measurements, lab results) and unstructured datasets (e.g., images, instrument data).
- Collaborate with cross-functional teams to understand data requirements and deliver solutions that meet business needs.
- Ensure data quality, integrity, and security across all data processes and platforms.
- Utilize GitHub and GitHub Actions for version control and CI/CD pipelines.
- Implement containerization and use container tools for efficient deployment and scaling of data solutions.
Required Skills:
- Proven experience in designing and implementing data lake solutions in cloud (GCP/AWS).
- Strong proficiency in Snowflake database management.
- Hands-on experience with Apache Airflow and DAG development.
- Ability to extract, transform, and load (ETL) data from diverse sources such as REST APIs, AWS S3, GCP BigQuery, and other databases.
- Experience working with both structured and unstructured datasets.
- Expertise in database management, ETL processes, and data architecture.
- Proven track record of designing and implementing robust data solutions.
- Proficiency with GitHub and GitHub Actions for version control and CI/CD.
- Experience with containerization and container tools (e.g., Docker, Kubernetes).
Preferred Skills:
- Clinical data background with experience in managing and analyzing clinical datasets.
- Experience with Veeva Vault integration for managing clinical data and documents.
- Experience with implementing or enhancing data sets for machine learning applications.
- Professional Data Engineer certification or equivalent experience