We are seeking a highly skilled and motivated Data Scientist to join our dynamic Advanced Analytics and Solutions team. This role will bridge the gap between machine learning and DevOps, ensuring the seamless deployment, scalability, and maintenance of our machine learning models and data pipelines. The ideal candidate will have a strong background in software development, data infrastructure, and machine learning operations (MLOps).
Key Responsibilities:
· Design, build, and maintain data predictive models.
· Design, build, and maintain robust, scalable, and efficient data pipelines to support machine learning workflows.
· Collaborate with data scientists and data engineers to streamline model development, deployment, and monitoring.
· Implement CI/CD pipelines for machine learning models to ensure rapid, reliable deployment and updates.
· Develop and maintain infrastructure for data storage, processing, and analysis, ensuring high availability and performance.
· Automate the provisioning, configuration, and monitoring of infrastructure to support data and machine learning applications.
· Monitor and troubleshoot issues in data pipelines, machine learning models, and production systems to ensure reliability and performance.
· Ensure security, compliance, and data privacy standards are met across all data and machine learning workflows.
· Document processes, configurations, and best practices to facilitate collaboration and knowledge sharing within the team.
Skills Required:
· Strong programming skills in Python and familiarity with other languages such as JavaScript, HTML, and CSS.
· Deep understanding of machine learning concepts, algorithms, and frameworks (e.g., TensorFlow, PyTorch, Scikit-learn).
· Expertise in data engineering tools and frameworks (e.g., Apache Spark, Synapse Analytics, PySpark)
· Proficiency in DevOps tools and practices, including CI/CD, containerization (Docker), and orchestration (Kubernetes).
· Experience with cloud platforms (Azure is preferred) and their machine learning and data services.
· Familiarity with infrastructure as code (IaC) tools such as Terraform.
· Strong knowledge of database systems (SQL and NoSQL).
· Experience with monitoring and logging tools (e.g., MLFlow).
· Experience with version control software (e.g., Git and GitHub)
· Excellent problem-solving skills and the ability to troubleshoot complex data and infrastructure issues.
· Strong communication skills and the ability to work collaboratively in a team environment.
Preferred Qualifications:
· Experience with MLOps tools and platforms (e.g., MLflow).
· Experience with Epic EMR system (Certified in Cogito and Clarity).
· Experience with Microsoft Azure Platform (Certified in Azure Data Science).
· Experience with real-time data processing and streaming applications.
· Advanced degree (master’s or PhD) in Computer Science, Data Science, Engineering, or a related field.
Programming and Software Required:
· Programming Languages: Python and familiarity or ability to learn: JavaScript, HTML, CSS
· Machine Learning Frameworks: TensorFlow, PyTorch, Scikit-learn
· Data Engineering Tools: Apache Spark, Synapse Analytics, PySpark
· DevOps Tools: Docker, Kubernetes, Git
· Cloud Platforms: Azure
Databases: SQL, NoSQL