Required: Strong OOP Python skills for Data Engineering
Desirable: MongoDB and Redis
Everything else below is a bonus
W2 only - sponsorship available
No c2c
Job Description Summary:
Generative AI (GenAI) presents an exciting opportunity to derive valuable insights from data and drive revenue growth, efficiencies, and improved business processes. Technology will collaborate with Global Markets Sales & Trading, Quantitative Strategies & Data Group (QSDG) & Platform teams to the design and buildout its global GenAI platform.
The platform will cater to a rapidly growing number of use cases that harness the power of GenAI. Both proprietary and open-source Large Language Models, and large structured and un-structured data sets will be leveraged to produce insights for Global Markets and its clients.
- We are seeking a Data Engineer to build out data pipelines to source large volumes of structured (ex: KDB) & unstructured data (ex: Research documents, Term Sheets), classify, and store data to meet GenAI requirements.
- The Data Engineer will design, develop, and engineer platform for high performance and scalability.
Key Responsibilities:
- Collaborate with data scientists and software engineers to understand data requirements.
- Develop and maintain scalable data ingestion pipelines from various data sources, including large databases (Hadoop, etc.), web services, and APIs.
- Analyze and organize raw data from different sources: internal or external.
- Implement data cleaning, transformation, and normalization processes to ensure data quality and consistency.
- Implement automated unit tests and conduct reviews with other team members to make sure your code is rigorously designed, elegantly coded, and effectively tuned for performance.
- Automate reports and processes to run with varying frequencies.
- Monitor and troubleshoot data pipelines to ensure data accuracy and pipeline performance.
- Explore ways to enhance data quality, reliability, and efficiency of data pipelines.
Experience required:
MUST HAVE Technical:
- Proficient in data engineering practices and using design and architectural patterns. o
- At least 4+ years' experience as a Data Engineer or in a similar role, in Extract, Transform, Load (ETL) and/or Extract, Load, Transform (ELT) processes. o
- Proficiency in working with unstructured and structured data. o
- Experience with data processing frameworks such as Hadoop, Spark, or similar technologies. o
- Experience with data platforms like SQL, HDFS, and NoSQL (MongoDB) databases. o
- Extensive experience with object-oriented programming (OOP)/Scripting languages (Python preferred). o
- Technical expertise with data models, data mining, and segmentation techniques. o
- Experience working in multiple technology deployment lanes (development through production). o
- DevOps processes and CICD tooling (Jira, Git/Bitbucket, Jenkins, Datical, Artifactory, Ansible), orchestration & automation.
- Containerization technologies such as Docker and Kubernetes
- Must have minimum 5 years similar experience
Non-Technical
- Ability to communicate effectively to a wide range of audience (business stakeholders, developer & support teams). o
- Detail oriented & highly organized. o
- Adaptable to shifting & competing priorities. o
- Problem solving skills to diagnose & resolve complex issues. o
- Committed and pro-active in ensuring high quality of service
Experience desired:
- Familiarity in AI & Deep learning, modeling techniques, Generative AI application stack. o
- Familiarity with containerization and orchestration technologies such as Docker, Kubernetes and OpenShift. o
- Experience with creating visualization dashboards (Tableau). o
- Experience with vector databases such as Redis.