Job Summary: As an AI Data Engineer, you will be responsible for designing, building, and maintaining scalable data pipelines and architectures that support our AI initiatives. You will work closely with data scientists and other stakeholders to ensure the efficient processing, transformation, and integration of data from various sources. This role requires hands-on experience in data engineering, a solid understanding of AI/ML models, and a strong background in cloud technologies.
Key Responsibilities:
- Design, develop, and maintain scalable ETL processes and data pipelines to support AI and machine learning initiatives.
- Work on AI-driven projects, specifically focusing on building and optimizing data models, data flows, and pipelines.
- Collaborate with data scientists to integrate and process event-based data for AI model training and inference.
- Utilize Python for data manipulation, ensuring data is clean, structured, and ready for analysis.
- Implement data processing solutions on cloud platforms, with a strong preference for Azure (AWS experience is also acceptable).
- Develop and maintain APIs for data access, integration, and real-time processing.
- Enhance and optimize existing data models and pipelines to improve performance and scalability.
- Apply a fundamental understanding of data science, LLMs, and AI models to contribute to the development of AI-driven solutions.
- Participate in the design and implementation of automated data processing workflows that reduce manual effort and improve data accuracy.
Must-Have Qualifications:
- Approximately 5 years of experience in data engineering, focusing on ETL and data pipeline architecture.
- Proven experience in working on at least one real-life AI project as a Data Engineer.
- Proficiency in Python for data manipulation and scripting.
- Strong experience with cloud platforms, particularly Azure (AWS experience is acceptable).
- Hands-on experience with API development and utilization.
- Experience in event-based data processing and building end-to-end data processes.
- Fundamental understanding of data science concepts, including LLMs and AI models.
- Strong problem-solving skills and the ability to work collaboratively in a team environment.
Nice-to-Have Qualifications:
- Master’s degree in Computer Science, Data Science, or a related field (Bachelor's degree with significant experience is also acceptable).
- Experience with Apache Spark.
- Certifications in Azure or related cloud technologies.
- Proficiency in SQL and understanding of how PowerBI integrates with data pipelines.
- Experience in the financial or investment banking industry.
Soft Skills:
- Excellent communication skills, both written and verbal.
- Ability to work independently and manage multiple projects simultaneously.
- Strong analytical mindset and attention to detail.
- Team player with a proactive approach to problem-solving.
Day-to-Day Responsibilities:
- Focus on building data pipelines and models that support AI initiatives.
- Collaborate with the team to enhance and optimize data models and pipelines.
- Work on AI projects that involve utilizing LLMs to automate processes and derive insights from large datasets.
- Manage a backlog of projects, including using metadata from emails to infer outcomes and enhance Salesforce data.
- Support the automation of processes that help bankers reduce their workflow from weeks to hours using OpenAI technologies.