Summary
We are seeking an experienced Data Quality Engineer to join our dynamic team. The ideal candidate will be responsible for ensuring the accuracy, consistency, and reliability of our organization's data assets. As a Data Quality Engineer, you will play a critical role in developing and implementing data quality standards, processes, and tools leveraging automation to uphold the integrity of our data throughout its lifecycle. This role requires a deep understanding of quality assurance, data quality principles, strong analytical and technical skills, and the ability to collaborate with cross-functional teams to identify and resolve data quality issues.
Responsibilities
1. Data Quality Framework Development:
o Design, implement, and maintain a robust data quality framework leveraging automation to assess, monitor, and enhance the quality of data across various systems and platforms.
o Design and implementation of E2E data testing strategies leading to successful execution of test cases to ensure high quality across the data platforms.
o Leverage automation in to the testing quality framework (Databricks, PySpark/Spark, SQL, Python)
2. Data Profiling and Analysis:
o Conduct thorough data testing and data profiling to identify anomalies, inconsistencies, and inaccuracies in datasets.
3. Data Quality Standards and Policies:
o Define and enforce data quality standards and policies, collaborating with stakeholders to establish guidelines for data accuracy, completeness, and timeliness.
o Leadership role in defect life cycle management: logging, replication, triage, verification, etc.
4. Quality Assurance Testing:
o Develop and execute comprehensive data quality testing strategies and plans to validate data accuracy and integrity in various data sources.
5. Data Cleansing and Remediation:
o Implement data cleansing and remediation processes to address identified data quality issues, ensuring data is accurate and compliant with organizational standards.
6. Collaboration:
o Collaborate with data engineers, data scientists, and other cross-functional teams to integrate data quality checks into the data pipeline and maintain quality throughout.
7. Documentation:
o Create and maintain documentation related to data quality processes, standards, and issue resolution procedures.
8. Monitoring and Reporting:
o Establish monitoring mechanisms to proactively identify data quality issues, and generate regular reports on data quality metrics for management review.
o Acquire high volumes of test data to support test execution.
9. Continuous Improvement:
o Continuously assess and enhance data quality processes and methodologies to adapt to evolving business needs and industry best practices.
10. Training and Knowledge Sharing:
o Provide training and guidance to team members on data quality best practices and principles. Facilitate knowledge sharing sessions to promote a culture of data quality awareness.
Required
1. Technology & Tooling:
o Strong skills and experience in using PySpark/Spark to automate test cases
o Apply strong SQL skills for querying with extensive join conditions and updating data based on specified conditions.
o Strong Python skills, key in scripting automation into the quality process.
2. CI/CD Lifecycle:
o Contribute to and enhance the continuous integration and continuous delivery (CI/CD) lifecycles.
3. Testing:
o Design and implement unit, integration, and regression tests to ensure the reliability of data quality processes.
o Experience in defect life cycle management: logging, replication, triage, verification, etc.
o Experience in validating and testing workflows in Databricks, with a focus on data extraction, transformation, and loading into Delta tables. Proficient in using Databricks notebooks and Delta Lake features for implementing and testing data transformations.
o Skilled in leveraging PySpark/Spark SQL and Delta Lake optimizations to ensure data quality and performance.
o Experience in E2E product features testing and business report testing.
4. Agile SCRUM Projects:
o Collaborate effectively within an Agile SCRUM framework, participating in sprint planning, reviews, and retrospectives.
Highly Desired
1. Data Management:
o Utilize expert level skills in Databricks, PySpark, Spark Structured Streaming, Delta Live Tables, and Delta Sharing for advanced data management.
2. Expert SQL Knowledge:
o Demonstrate expert-level knowledge of SQL for complex data querying and manipulation.
3. Python Development:
o Utilize expert level skills to develop and maintain data quality scripts, tools, and frameworks to support data quality.
4. Streaming Technologies:
o Apply experience with streaming technologies such as Apache Kafka, Azure EventHubs, and Avro to enhance data processing capabilities.
5. DevOps Proficiency:
o Work within a DevOps environment, demonstrating expertise in Linux, GitHub, and Bash scripting.
Education and Experience
- Bachelor's or Master's degree in Computer Science, Information Technology or related field.
- Proven experience as a Data Quality Assurance Pipeline Specialist or in a similar role.
- Strong understanding of data quality concepts, methodologies, and best practices.
- Familiarity with data governance frameworks and practices.
- Excellent problem-solving and analytical skills.
- Strong communication and interpersonal skills, with the ability to collaborate effectively with cross-functional teams.
- Strong skills and experience with data quality tools and technologies.
- Relevant certifications (e.g., Certified Data Management Professional) are advantageous.
If you are a data quality enthusiast with a strong technical background, we invite you to join our team and contribute to the excellence of our data-driven solutions.