ML Research Engineer- Pretraining
Palo Alto, CA
What We're Building
We are entering a new growth phase focused on partnering with commercial entities to adapt and fine-tune our advanced models to meet their specific business needs. Our achievements in developing, aligning, and deploying cutting-edge models for our high-EQ consumer-facing chatbot have laid a solid foundation for continued success. With substantial resources and a strong infrastructure, we are well-equipped to support top-tier model finetuning. Joining our team offers an opportunity to bring your expertise to a dynamic, innovation-driven environment that values collaboration.
About Us
We are a small, interdisciplinary AI studio focused on training and fine-tuning state-of-the-art language models for specific commercial applications. Our mission is to leverage AI to drive significant, positive change. As a public benefit corporation, we prioritize the well-being and happiness of our partners, users, and broader stakeholders.
About the Role
Our pretraining team is responsible for creating and refining the foundational models that enable our AI capabilities for enterprise solutions. Research engineers in this role will focus on developing large-scale training datasets, optimizing training processes, and innovating model architectures to push the limits of what our models can achieve in enterprise settings.
This role is a good fit if you:
- Have experience training large-scale language models from scratch or on extensive datasets.
- Are skilled in managing and efficiently utilizing large compute resources for training.
- Have a strong background in modern deep learning techniques and architectures, particularly with transformer models, and are proficient in PyTorch.
- Enjoy experimenting with new training methodologies and hyperparameter tuning to achieve state-of-the-art results.
- Are familiar with distributed training frameworks and tools like Horovod or DeepSpeed.
Our Work Culture
We prioritize excellence and ownership, with an organizational structure that emphasizes individual responsibility over management hierarchies. We believe in the power of highly talented individual contributors, providing them with the resources and autonomy to deliver outstanding results. Teamwork, generosity, and a culture of constructive disagreement are at our core, fostering an environment where positive challenges and new ideas are encouraged. We also value strong communication, particularly in writing, and maintain a close feedback loop between user experience and AI development.
Engineering Approach
As a vertically integrated AI studio, we build and optimize our entire technology stack in-house, from large foundational model pretraining to the user interface. We are committed to scale as a driver of progress in AI, developing and deploying new AI generations on one of the largest supercomputers in the world. Our approach blurs the lines between engineering and research, with a continuous focus on innovation guided by user feedback.
Benefits
We offer generous benefits to ensure a positive, inclusive, and inspiring work environment, including:
- Unlimited paid time off
- Parental leave and flexibility for all parents and caregivers
- Comprehensive medical, dental, and vision plans for US employees
- Compliance with country-specific benefits for non-US employees
- Visa sponsorship for new hires
- Opportunities for personal growth, such as coaching, conference attendance, or specific training
Diversity & Inclusion
We are committed to building personal AIs that serve everyone, and we strive to represent the full spectrum of human experience within our AI studio. We welcome individuals from all walks of life who possess the right skills and actively cultivate diverse candidate pools for all open roles.
Keywords: Research Engineer, Pretraining, AI Studio, Language Models, Fine-Tuning, Transformer Models, Deep Learning, PyTorch, Distributed Training, Horovod, DeepSpeed, Large-Scale Training, Compute Resources, Innovation, User Feedback, Vertical Integration, AI Development, Model Architecture, Engineering, Artificial intelligence, Machine learning, ML, Deep Learning,