Description
Searching for a Sr. Data engineer who possess strong skills in developing, managing, and optimizing data workflows, handling complex data environments, and utilizing contemporary data platforms. This position requires expertise in Databricks, Airflow, Snowflake, PySpark, Python, and SQL, as well as the capability to design and maintain data ecosystems and operational data systems.
Responsibilities
- Data Workflow Development: Design and implement efficient, scalable, and reliable data workflows using Databricks, PySpark, and SQL to handle large datasets.
- Lead Airflow Optimization and Implementation: Lead efforts to set up, maintain, and optimize Airflow, providing the engineering team with guidance on best practices.
- Environment Setup: Work closely with the environment manager to set up and maintain various environments (development, testing, and production) optimized for both performance and cost.
- Operational Data Systems: Collaborate on the design and development of operational data systems, ensuring they meet the needs of performance, scalability, and availability.
- Optimization: Constantly monitor and adjust data processes to optimize resource utilization and minimize delays in data handling.
- Best Practices: Develop and enforce best practices for coding, testing, documentation, and deployment within the data engineering team.
Requirements
- 5+ years in data engineering, focusing on managing large-scale environments and building complex data workflows.
- Databricks: Extensive hands-on experience.
- Airflow: Strong expertise in job scheduling and automation with Airflow.
- Snowflake: Deep knowledge of Snowflake as a data warehousing tool.
- PySpark: Advanced proficiency in PySpark for distributed data operations.
- Python: Solid Python skills for scripting and building data workflows.
- SQL: Expert-level SQL for managing and querying data.
- Environment Management: Proven experience managing cloud-based environments.
- Operational Data Stores: Experience in creating and managing operational data systems.
- Bachelor’s degree in Computer Science, Information Systems, or an equivalent field of study, or equivalent work experience.
Preferred Skills:
- Cloud Platforms: Experience with AWS, Azure, or GCP.
- Serverless Technologies: Familiarity with designing and implementing serverless solutions.
- Data Lakes: Hands-on experience with Data Lake/Delta Lake.
- Event-Driven Pipelines: Experience in designing event-driven data workflows.
- CI/CD: Knowledge of continuous integration and deployment in a data engineering context.
- Big Data Tools: Experience with additional tools in the big data space is a plus.