Would you be interested in exploring a perm full-time role for our start-up in Palo Alto, CA who are specialized in building Frontier and Foundational LLM.
As an Architect, you should be expertise in architecting the scalable training methodologies, implement the state-of-art neural architecture (Noval Neural network).
As an Engineer: efficiently train frontier and foundation multimodal large language models.
In this hands-on role, you will optimize and implement state of art neural architecture, robust training and inference infrastructure to efficiently take complex models with hundreds of billions and trillions of parameters to production while optimizing for low latency, high throughput, and cost efficiency.
Key Responsibilities:
- Architect Distributed Training Systems: Design and implement highly scalable distributed training pipelines for LLMs and frontier models, leveraging model parallelism (tensor, pipeline, expert) and data parallelism techniques.
- Optimize Performance: Utilize deep knowledge of CUDA, C++, and low-level optimizations to enhance model training speed and efficiency across diverse hardware configurations.
- Implement Novel Techniques: Research and apply cutting-edge parallelism techniques like Flash
- Attention to accelerate model training and reduce computational costs.
- Framework Expertise: Demonstrate proficiency in deep learning frameworks such as PyTorch, TensorFlow, and JAX, and tailor them for distributed training scenarios.
- Scale to Hundreds of Billions of Parameters: Work with massive models, ensuring stable and efficient training across distributed resources.
- Evaluate Scaling Laws: Design and conduct experiments to analyze the impact of model size, data, and computational resources on model performance.
- Collaborate: Partner closely with research scientists and engineers to integrate research findings into production-ready training systems.
Please reach out to Jia for more information about the role and clients.