Jobs search

Data Engineer - Gen AI - (Python/Mongo/Redis)

Randstad Digital • new york city, ny, us • 3m ago

Required: Strong OOP Python skills for Data Engineering

Desirable: MongoDB and Redis

Everything else below is a bonus

W2 only - sponsorship available

No c2c

Job Description Summary:

Generative AI (GenAI) presents an exciting opportunity to derive valuable insights from data and drive revenue growth, efficiencies, and improved business processes. Technology will collaborate with Global Markets Sales & Trading, Quantitative Strategies & Data Group (QSDG) & Platform teams to the design and buildout its global GenAI platform.

The platform will cater to a rapidly growing number of use cases that harness the power of GenAI. Both proprietary and open-source Large Language Models, and large structured and un-structured data sets will be leveraged to produce insights for Global Markets and its clients.

We are seeking a Data Engineer to build out data pipelines to source large volumes of structured (ex: KDB) & unstructured data (ex: Research documents, Term Sheets), classify, and store data to meet GenAI requirements.
The Data Engineer will design, develop, and engineer platform for high performance and scalability.

Key Responsibilities:

Collaborate with data scientists and software engineers to understand data requirements.
Develop and maintain scalable data ingestion pipelines from various data sources, including large databases (Hadoop, etc.), web services, and APIs.
Analyze and organize raw data from different sources: internal or external.
Implement data cleaning, transformation, and normalization processes to ensure data quality and consistency.
Implement automated unit tests and conduct reviews with other team members to make sure your code is rigorously designed, elegantly coded, and effectively tuned for performance.
Automate reports and processes to run with varying frequencies.
Monitor and troubleshoot data pipelines to ensure data accuracy and pipeline performance.
Explore ways to enhance data quality, reliability, and efficiency of data pipelines.

Experience required:

MUST HAVE Technical:

Proficient in data engineering practices and using design and architectural patterns. o
At least 4+ years' experience as a Data Engineer or in a similar role, in Extract, Transform, Load (ETL) and/or Extract, Load, Transform (ELT) processes. o
Proficiency in working with unstructured and structured data. o
Experience with data processing frameworks such as Hadoop, Spark, or similar technologies. o
Experience with data platforms like SQL, HDFS, and NoSQL (MongoDB) databases. o
Extensive experience with object-oriented programming (OOP)/Scripting languages (Python preferred). o
Technical expertise with data models, data mining, and segmentation techniques. o
Experience working in multiple technology deployment lanes (development through production). o
DevOps processes and CICD tooling (Jira, Git/Bitbucket, Jenkins, Datical, Artifactory, Ansible), orchestration & automation.
Containerization technologies such as Docker and Kubernetes
Must have minimum 5 years similar experience

Non-Technical

Ability to communicate effectively to a wide range of audience (business stakeholders, developer & support teams). o
Detail oriented & highly organized. o
Adaptable to shifting & competing priorities. o
Problem solving skills to diagnose & resolve complex issues. o
Committed and pro-active in ensuring high quality of service

Experience desired:

Familiarity in AI & Deep learning, modeling techniques, Generative AI application stack. o
Familiarity with containerization and orchestration technologies such as Docker, Kubernetes and OpenShift. o
Experience with creating visualization dashboards (Tableau). o
Experience with vector databases such as Redis.