LeadStack Inc. is an award-winning, one of the nation's fastest-growing, certified minority-owned (MBE) staffing services provider of contingent workforce. As a recognized industry leader in contingent workforce solutions and Certified as a Great Place to Work, we're proud to partner with some of the most admired Fortune 500 brands in the world.
TITLE: ETL Tester
LOCATION: Initially Remote in Chicago, IL/Boca Raton, FL, /Cincinnati, OH /Charlotte, NC /Portland, OR /San Jose, CA Local candidate only – will be hybrid in the future
DURATION: 6month CTH
Job Id: 24-01829
Pay range: $60-$65/hour
Top Skills:
- GCP + BigQuery , GCS, and Apache Airflow
- Databricks for data processing and analytics
- Developing testing scripts and automation in Python
- Validating ETL processes
Job Description:
We are seeking a detail-oriented ETL Tester with expertise in Python, Databricks, and Google Cloud Platform (GCP) to join our data engineering team. The ideal candidate will be responsible for validating ETL processes, ensuring data quality, and supporting data integration initiatives. You will work closely with data engineers, analysts, and stakeholders to ensure that our data pipelines are robust, accurate, and reliable.
- Strong knowledge of Google Cloud Platform, specifically BigQuery, GCS, and Airflow.
- 4+ years of Experience with Databricks for data processing and analysis.
- Proficiency in Python for developing testing scripts and automating testing processes.
Key Responsibilities:
Validate and verify ETL processes implemented in GCP, ensuring data integrity during extraction, transformation, and loading.
Develop and execute comprehensive test cases to confirm that data transformations meet business requirements.
Data Quality Assurance:
Conduct data profiling and perform quality checks to identify and resolve discrepancies in datasets.
Monitor data quality metrics and report on data integrity and quality issues.
Test Case Development:
Create and maintain detailed test plans and test cases based on ETL specifications and business needs.
Ensure full coverage of all ETL processes, including data extraction, transformation, and loading.
Collaboration:
Work closely with data engineers, data scientists, and other stakeholders to understand ETL workflows and data flows.
Participate in design reviews to provide input on testing strategies and best practices.
Automation:
Use Python to develop automated testing scripts for ETL validation and data quality checks.
Leverage Databricks notebooks for testing and validating ETL processes efficiently.
Workflow Management:
Utilize Apache Airflow for scheduling, monitoring, and managing ETL workflows.
Collaborate with teams to troubleshoot and optimize Airflow DAGs related to ETL processes.
Issue Tracking and Resolution:
Identify, document, and track defects and data quality issues throughout the ETL process.
Work with engineering teams to diagnose and resolve data-related problems quickly.
Documentation:
Maintain clear and comprehensive documentation of testing processes, test cases, and results.
Document data mappings, transformation rules, and data flow diagrams for reference.
Continuous Improvement:
Contribute to the enhancement of ETL testing methodologies and data management practices.
Stay updated on GCP, Databricks, and industry trends to continuously improve testing strategies.
know more about current opportunities at LeadStack , please visit us on https://leadstackinc.com/careers/
Should you have any questions, send an email on waseem.ahmad@leadstackinc.com