Job Description: MLOps Engineer
Position Overview: We are seeking an experienced MLOps Engineer to scale pre-developed model training and inference across multi-GPU and CPU environments. The ideal candidate will have a strong foundation in Python, SQL, Kubernetes, and extensive experience in deploying and managing machine learning models.
Key Responsibilities:
Design and implement scalable solutions for model training and inference using multi-GPU and CPU resources.
Collaborate with data scientists to optimize existing models for deployment in production.
Develop and maintain CI/CD pipelines for machine learning workflows.
Manage containerized applications using Kubernetes for efficient resource utilization and scaling.
Ensure robust monitoring, logging, and performance tracking of deployed models.
Optimize database interactions using SQL for efficient data retrieval and processing.
Troubleshoot and resolve issues related to model performance and deployment.
Qualifications:
Strong proficiency in Python and SQL.
Hands-on experience with Kubernetes and container orchestration.
Familiarity with ML frameworks such as TensorFlow, PyTorch, or similar.
Proven track record in deploying and scaling machine learning models in production environments.
Understanding of MLOps best practices and tools (e.g., MLflow, Kubeflow).
Strong problem-solving skills and ability to work in a fast-paced environment.
Preferred Skills:
Experience with cloud platforms (AWS, GCP, Azure).
Knowledge of data engineering and ETL processes.
Familiarity with version control systems (e.g., Git).
Education:
Bachelor’s degree in Computer Science, Data Science, or a related field (Master's preferred).