We’re looking for a Staff MLOps Engineer to join our Machine Learning team. You’ll work closely with a team of engineers to create a platform on top of that data that will be leveraged by virtually every other product and system we have built or will build in the future. You’ll be responsible for building and maintaining the infrastructure and tooling that enables our ML Engineers and Data Scientists to focus on model development and feature engineering.
Key Responsibilities
- Design, implement, and maintain robust MLOps platforms and tooling for both batch and streaming ML pipelines.
- Develop and manage monitoring and observability solutions for ML systems.
- Lead DevOps practices, including CI/CD pipelines and Infrastructure as Code (IaC).
- Architect and implement cloud-based solutions on AWS.
- Collaborate with ML Engineers and Data Scientists to develop, train, and deploy machine learning models.
- Engage in feature engineering and model optimization to improve ML system performance.
- Participate in the full ML lifecycle, from data preparation to model deployment and monitoring.
- Optimize and refactor existing systems for improved performance and reliability.
- Drive technical initiatives and best practices in both MLOps and ML Engineering.
Required Skills And Experience
- Strong Python Proficiency: Excellent skills for developing, deploying, and maintaining our machine learning systems.
- Language Versatility: Experience with statically-typed or JVM languages. Willingness to learn Scala is highly desirable.
- Cloud Engineering Skills: Extensive experience with Cloud Platforms & Services, ideally AWS (e.g., Lambda, ECS, ECR, CloudWatch, MSK, SNS, SQS).
- Infrastructure as Code: Proficiency in IaC, particularly Terraform.
- Kubernetes Expertise: Strong hands-on experience with managing clusters and deploying services.
- Data Orchestration: Experience with ML orchestration tools (e.g., Flyte, Airflow, Kubeflow, Luigi, or Prefect).
- CI/CD: Expertise in pipelines, especially GitHub Actions and Jenkins.
- Networking: Knowledge of concepts and implementation.
- Streaming: Experience with Kafka and other streaming technologies.
- ML Monitoring: Familiarity with observability tools (e.g., Arize AI, Weights and Biases).
- NLP/LLMs: Experience with NLP, LLMs, and RAG systems in production, or strong desire to learn.
- CLI & Shell Scripting: Proficiency in scripting and command-line tools.
- APIs: Experience with deploying and managing production APIs.
- Software Engineering Best-Practices: Knowledge of industry standards and practices.
Preferred Qualifications
- AWS AI Services: Hands-on experience with AWS SageMaker and/or AWS Bedrock.
- Data Processing: Experience with high-volume, unstructured data processing.
- ML Applications: Familiarity with NLP, Computer Vision, and traditional ML applications.
- System Migration: Previous work in refactoring and migrating complex systems.
- AWS Certification: AWS Solution Architect Professional or Associate certification.
- Advanced Degree: Master's degree in ML / AI / Computer Science.
Personal Qualities
- Passionate about building developer-friendly platforms and tools.
- Thrives in a terminal-based development environment.
- Enthusiastic about creating production-grade, robust, reliable, and performant systems.
- Not afraid to dive into and improve complex existing solutions.
- Team player who works well with ML Engineers, Data Scientists, and management.
- Strong technical mentoring skills.
- Excellent problem-solving and communication skills.