Responsibilities
1. Contribute to system-level innovations for ML silicon products.
2. Create and maintain methodologies to evaluate a wide range of ML systems optimizations, such as kernel fusion, 4D parallelism and dynamic batching, on prospective silicon products.
3. Collaborate with system and ML application software engineers to understand the constrains and trade-offs.
4. Collaborate with hardware engineers to optimize across the hardware/software boundary.
Requirements
1. MS or PhD degree with a focus on ML systems, or equivalent years of experience.
2. Understanding of transformer architecture and related optimizations at model architecture and system level.
3. Experience with ML frameworks.
4. Strong communication skills.
Preferred qualifications
1. Familiarity with performance tuning tools and methodologies on GPGPU or ML accelerator hardware.
2. Knowledge of the GPGPU or ML accelerator microarchitecture from a performance tuning perspective.
3. Proven track record of discovering and implementing ML system performance optimizations in industry or academia.
4. Knowledge of ML trends and experience in outreaching to ML researchers and application developers.