Data Engineer

DRUM UPSKILL • united states, united states, us • 3m ago

Data Engineer

New Jersey

Our client is a global leader in consumer products, specializing in the production, distribution, and provision of household, health care, and personal care products. With a rich history dating back over 200 years, they are committed to improving the quality of life for consumers worldwide. They are seeking a highly skilled Data Engineer to join their dynamic team and contribute to our mission of sustainability and innovation.

Responsibilities:

Design, architect, and implement a comprehensive data lake solution, integrating a variety of different types of data sources.
Develop and maintain Directed Acyclic Graphs (DAGs) using Apache Airflow for efficient data workflows.
Extract, transform, and load (ETL) data from various sources, including REST APIs, AWS S3 buckets, GCP BigQuery, and other on-prem/cloud databases.
Manage and optimize Snowflake databases to ensure high performance and reliability.
Solution data streams for both structured datasets (e.g., clinical measurements, lab results) and unstructured datasets (e.g., images, instrument data).
Collaborate with cross-functional teams to understand data requirements and deliver solutions that meet business needs.
Ensure data quality, integrity, and security across all data processes and platforms.
Utilize GitHub and GitHub Actions for version control and CI/CD pipelines.
Implement containerization and use container tools for efficient deployment and scaling of data solutions.

Required Skills:

Proven experience in designing and implementing data lake solutions in cloud (GCP/AWS).
Strong proficiency in Snowflake database management.
Hands-on experience with Apache Airflow and DAG development.
Ability to extract, transform, and load (ETL) data from diverse sources such as REST APIs, AWS S3, GCP BigQuery, and other databases.
Experience working with both structured and unstructured datasets.
Expertise in database management, ETL processes, and data architecture.
Proven track record of designing and implementing robust data solutions.
Proficiency with GitHub and GitHub Actions for version control and CI/CD.
Experience with containerization and container tools (e.g., Docker, Kubernetes).

Preferred Skills:

Clinical data background with experience in managing and analyzing clinical datasets.
Experience with Veeva Vault integration for managing clinical data and documents.
Experience with implementing or enhancing data sets for machine learning applications.
Professional Data Engineer certification or equivalent experience