Job Title: Data Engineer
Location: Downtown San Francisco, CA / Remote
Type: Full-time/Contract
Job Summary:
We are seeking a highly skilled Data Engineer to join our client's team. This role involves designing, building, and maintaining robust data solutions. You will utilize a diverse set of technologies and tools to enhance our data infrastructure and enable high-performance analytics across the organization. The ideal candidate has experience with cloud data warehousing, ETL pipeline development, workflow orchestration, data governance, and master data management.
Key Responsibilities:
- Develop scalable and cost-effective data warehouse solutions in Snowflake.
- Leverage AWS cloud infrastructure to ensure optimal performance and scalability.
- Design, develop, and deploy high-performance data pipelines using tools like dbt™, Fivetran, and Matillion Cloud ETL.
- Ensure data pipelines are robust, maintainable, and scalable.
- Utilize Apache Airflow to orchestrate and automate data workflows.
- Ensure efficient scheduling and execution of data tasks.
- Develop and implement data privacy policies, including data encryption, anonymization, and access controls using techniques like masking.
- Establish and enforce data governance frameworks covering data classification, lineage, quality, and retention.
- Design and build highly redundant and scalable cloud infrastructure for data lakes and data labs using AWS services such as EMR, SageMaker, and S3.
- Manage and optimize data storage and processing solutions.
- Develop Python scripts for data enrichment and MDM.
- Integrate data from various B2B sources to maintain a consistent and accurate master dataset.
- Build and maintain CI/CD pipelines to streamline deployment processes.
- Ensure smooth integration of new features and updates into production environments.
- Work closely with cross-functional teams, including data scientists and analysts, to meet data requirements.
- Lead projects from conception to production, breaking down tasks and managing implementation effectively.
Preferred Technical Stack:
- Cloud Data Warehouse: Snowflake
- ETL Tools: Matillion Cloud ETL, Fivetran, dbt™ Cloud
- Programming Languages: Python, SQL
- Workflow Orchestration: Apache Airflow
- Cloud Platforms: AWS (including EMR, SageMaker, S3, Datalake)
- Infrastructure as Code: Terraform
- Big Data Technologies: Apache Spark, Hudi
- Data Storage: NoSQL databases like MongoDB and DynamoDB
- Data Visualization: Tableau
- Data Engineering and Operations: Data Marts, Data Warehousing, Data Governance, Data Privacy, MLOps
Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- Proven experience in data engineering, with a strong focus on cloud-based data solutions.
- Proficiency in Python and SQL.
- Hands-on experience with Snowflake and AWS services.
- Strong knowledge of ETL tools like Matillion, Fivetran, and dbt™.
- Experience with workflow orchestration tools like Apache Airflow.
- Familiarity with data governance and data privacy best practices.
- Experience with CI/CD pipeline creation and management.
- Excellent problem-solving skills and attention to detail.
- Strong communication and leadership skills.