Job Summary:
We are seeking a highly skilled Senior Data Engineer to design and implement scalable, reliable data pipelines that drive our analytics and operational applications. As a key member of our data engineering team, you will work with cutting-edge technologies such as Apache Spark, Hadoop, Kafka, and cloud services from AWS and Azure. Your expertise in data modeling, ETL workflows, and cloud-based data processing will be critical in supporting our data-driven initiatives and ensuring the highest standards of data quality, security, and performance.
Key Responsibilities:
- Data Pipeline Development: Design and implement scalable and reliable data pipelines using big data technologies like Apache Spark, Hadoop, and Kafka. Ingest, process, and store diverse data at scale to meet the needs of our analytics and operational teams.
- Cloud Integration: Work within cloud environments such as AWS or Azure to leverage services including EC2, RDS, S3, Lambda, and Azure Data Lake for efficient data handling and processing. Integrate cloud storage and compute services with Databricks for optimized data workflows.
- Data Modeling & Storage: Develop and optimize data models and storage solutions (SQL, NoSQL, Data Lakes) to support both operational and analytical applications. Ensure data quality, accessibility, and scalability.
- ETL/ELT Workflows: Utilize ETL tools and frameworks like Apache Airflow or Talend to automate data workflows, ensuring timely and efficient data integration across the organization.
- Collaboration with Data Science: Work closely with data scientists to provide the necessary data infrastructure and tools for complex analytical models. Leverage Python or R for data processing scripts to support machine learning and AI initiatives.
- Data Governance & Security: Ensure compliance with data governance and security policies by implementing best practices in data encryption, masking, and access controls within cloud environments.
- Performance Monitoring & Optimization: Monitor and troubleshoot data pipelines and databases for performance issues. Apply tuning techniques to optimize data access and throughput, ensuring system reliability and efficiency.
- Continuous Improvement: Stay current with emerging technologies and methodologies in data engineering. Advocate for and implement improvements to enhance the data ecosystem and support the organization's evolving needs.
Required Qualifications:
- Education: Bachelor’s degree in Computer Science, MIS, or a related field, or an equivalent combination of education and experience.
- Experience: 8+ years of experience in data engineering, with a proven track record in designing and operating large-scale data pipelines and architectures.
- Technical Expertise:
- Expertise in developing ETL/ELT workflows using tools like Apache Airflow or Talend.
- Comprehensive knowledge of platforms and services like Databricks, Dataiku, and AWS data offerings.
- Solid experience with big data technologies such as Apache Spark, Hadoop, and Kafka.
- Extensive experience with AWS and Azure cloud services, including hands-on experience integrating cloud storage and compute services with Databricks.
- Proficient in SQL and programming languages relevant to data engineering (Python, Java, Scala).
- Hands-on RDBMS experience, including data modeling, analysis, programming, and creating stored procedures.
- Additional Skills: Familiarity with machine learning model deployment and management practices is a plus.
- Soft Skills: Strong communication skills with the ability to collaborate effectively across both technical and non-technical teams.