Machine Learning Researcher- Inference

Acceler8 Talent • united states, united states, us • 2m ago

What We're Developing

As we enter a new phase of expansion, our focus is on partnering with commercial clients to tailor and enhance our advanced models to meet their specific business needs. Our achievements in creating, aligning, and deploying state-of-the-art models in our highly empathetic consumer-facing chatbot have laid a solid foundation for success. With strong financial backing and abundant H100 resources, we have established a resilient infrastructure and efficient workflows to support top-tier finetuning. By joining our team, you'll have the opportunity to leverage your skills while being part of a vibrant organization that values innovation and teamwork.

About Us

We are a small, interdisciplinary AI studio. We have trained several state-of-the-art language models, including multiple versions, and developed a personal assistant. Currently, our studio is dedicated to finetuning and deploying models for specific applications for our commercial clients.

We believe that artificial intelligence marks the beginning of a period of exponential transformation. Our name reflects this moment of change, and our status as a public benefit corporation provides us with the legal framework to prioritize the well-being and happiness of our partners, users, and broader stakeholders above all else.

About the Position

Research Engineer, Member of Technical Staff (Inference)

As part of our commitment to deploying high-performance models for enterprise applications, our inference team ensures that these models operate efficiently and effectively in real-world situations. Research engineers in this position focus on optimizing model inference processes, reducing response times, and enhancing throughput without sacrificing model performance, ensuring reliable deployment in corporate settings.

This role is ideal for you if you:

Have experience with deploying and optimizing large language models for inference, both in cloud and on-premises environments.
Are skilled in using tools and frameworks for model optimization and acceleration, such as ONNX, TensorRT, or TVM.
Enjoy diagnosing and resolving complex issues related to model performance and scalability.
Have a strong understanding of the trade-offs involved in model inference, including hardware limitations and real-time processing requirements.
Are proficient with PyTorch and familiar with infrastructure management tools like Docker and Kubernetes for deploying inference pipelines.

We do not require a specific educational background or a set number of years of experience. We are eager to see what you have been building. Please send us examples of your best work, including but not limited to links to open-source contributions, personal projects, or a cover letter describing past projects that you are proud of.

Keywords: Advanced models, Efficient workflows, Finetuning, Innovation, language models, LLMs,

Inference, High-performance models Enterprise applications, Optimizing model inference, Enhancing throughput, Reliable deployment, Large language models,

Cloud environments, Model optimization, Model acceleration, ONNX, TensorRT, TVM, Scalability,

Real-time processing, PyTorch, Docker, Kubernetes, Inference pipelines