A fast-growing Series B company in our portfolio is looking for a hands-on leader to head its infrastructure team. This team plays a critical role in the company's machine learning platform, designing, developing, and optimizing core systems. This position is perfect for an infrastructure engineering expert with a solid technical background who thrives in mentoring and leading teams.
About the Company:
They're developing the most efficient, scalable, and reliable solution for running machine learning workloads—whether in their cloud or the customers.
In this role, you will have the opportunity to:
- Lead, manage, and mentor the infrastructure engineering team responsible for building the backbone of the ML platform. Define and execute the technical strategy for infrastructure, ensuring performance, security, and scalability of crucial systems.
- Collaborate with ML teams and cross-functional stakeholders to ensure seamless integration of models into production environments.
- Design and implement scalable solutions, including CI/CD pipelines, container orchestration, and cloud infrastructure (AWS, GCP, etc.).
- Optimize system performance by identifying and addressing infrastructure bottlenecks.
- Own end-to-end project management for infrastructure initiatives, from planning to execution and ongoing maintenance.Foster engineering best practices and a culture of continuous improvement within the team.
Qualifications:
- Bachelor’s, Master’s, or Ph.D. in Computer Science, Engineering, or related field.5+ years of professional experience in infrastructure or software engineering, with at least 2 years in a technical leadership role.
- Expertise in infrastructure design, including containerization (Docker), orchestration (Kubernetes), and cloud platforms (AWS, GCP).
- Strong experience with CI/CD pipelines, infrastructure as code (Terraform, Ansible), and monitoring systems.Solid understanding of networking, security, and high-availability infrastructure design.
- Experience managing and scaling infrastructure for machine learning or similar high-performance workloads.
- Proven track record of leading teams and delivering large-scale, production-level infrastructure solutions.Excellent problem-solving skills and the ability to drive technical projects from idea to completion.
Logistical Questions:
Stage: Series B
Location: San Francisco/Hybrid
Reports to: CTO/Founder
Team Size: 8 People