Job Title: Data Tester (Python, Databricks, GCP)
Duration: 6 months
Location: Remote
Required Pay Scale: $55/hour on W2
***Due to client requirements this role is only open to USC or GC candidates***
Job Description
We are seeking a detail-oriented ETL Tester with expertise in Python, Databricks, and Google Cloud Platform (GCP) to join our data engineering team. The ideal candidate will be responsible for validating ETL processes, ensuring data quality, and supporting data integration initiatives. You will work closely with data engineers, analysts, and stakeholders to ensure that our data pipelines are robust, accurate, and reliable.
Key Responsibilities
ETL Process Validation:
- Validate and verify ETL processes implemented in GCP, ensuring data integrity during extraction, transformation, and loading.
- Develop and execute comprehensive test cases to confirm that data transformations meet business requirements.
Data Quality Assurance:
- Conduct data profiling and perform quality checks to identify and resolve discrepancies in datasets.
- Monitor data quality metrics and report on data integrity and quality issues.
Test Case Development:
- Create and maintain detailed test plans and test cases based on ETL specifications and business needs.
- Ensure full coverage of all ETL processes, including data extraction, transformation, and loading.
Collaboration:
- Work closely with data engineers, data scientists, and other stakeholders to understand ETL workflows and data flows.
- Participate in design reviews to provide input on testing strategies and best practices.
Automation:
- Use Python to develop automated testing scripts for ETL validation and data quality checks.
- Leverage Databricks notebooks for testing and validating ETL processes efficiently.
Workflow Management:
- Utilize Apache Airflow for scheduling, monitoring, and managing ETL workflows.
- Collaborate with teams to troubleshoot and optimize Airflow DAGs related to ETL processes.
Issue Tracking and Resolution:
- Identify, document, and track defects and data quality issues throughout the ETL process.
- Work with engineering teams to diagnose and resolve data-related problems quickly.
Documentation:
- Maintain clear and comprehensive documentation of testing processes, test cases, and results.
- Document data mappings, transformation rules, and data flow diagrams for reference.
Continuous Improvement:
- Contribute to the enhancement of ETL testing methodologies and data management practices.
- Stay updated on GCP, Databricks, and industry trends to continuously improve testing strategies