ML Engineer (Inference)

PlayAI

Date: 22 hours ago

City: Palo Alto, CA

Salary: $150,000 - $220,000 per year

Contract type: Full time

About Us:

PlayAI (fka PlayHT, YC '23) is at the forefront of generative voice and conversational LLMs, reshaping how humans interact with technology. Our advanced Speech Synthesis and Voice Cloning models power hyperrealistic, human-like conversational experiences across industries.

We’re building the core infrastructure for conversational AI—enabling businesses, developers, and creators to easily build intelligent voice agents and interactive voice applications. Whether it's serving customers or powering creative projects, PlayAI helps bring talking, human-like AI to life.

Since finishing Y Combinator’s W23 batch, we’ve raised over $21M in seed funding, grown to 1.4M+ monthly active users and 500K+ developers, and are scaling revenue at 35% quarter-over-quarter.

What are we looking for?

We are in search of Machine Learning Engineers who are passionate about solving challenging problems in multimodal (voice, text, etc.) foundational model inference and enabling revolutionary experience in human-AI interaction. By joining our team, you have the opportunity to be a founding engineer and play a pivotal role in shaping the future of Conversational AI. If you're keen on pushing AI boundaries and making a significant impact, this role is for you.

Responsibilities:

Designing, building and optimizing multimodal foundational mode inference frameworks.
Inventing and implementing novel algorithms and features for inferencing streaming multimodal models..
Co-designing next generation multimodal foundation model architectures with the training team to hit the pareto frontier of quality and efficiency.

Qualifications:

Demonstrates a growth mindset and a passion for solving challenging problems.
Possesses previous academic or work experience in:

LLM inference frameworks (TensorRT-LLM, vLLM, SGLang, etc.)
LLM Inference algorithms (quantization, sparse attention, speculative sampling, etc.)
LLM architecture and training (parallelization, MoE, etc.) is a plus
Low level implementations (Flash Attention, GEMM, etc.) is a plus
General machine learning (diffusion, GAN, etc.) is a plus

Experience with Pytorch, Python and C++. Mastery in CUDA is a plus.
Master's degree in a related technical field or Bachelor's degree from a top-tier university with relevant work experience (internships, full-time roles, or equivalent). Recent graduates and current students are encouraged to apply.

What We Offer:

Challenging problems to solve.
Autonomous working environment
Competitive compensation
Flexible work hours
Health, dental, and vision insurance
Commuter benefits
Flexible PTO + holidays

Final offer amounts are determined by multiple factors, including experience, and may vary from the amounts listed above.

Compensation Range: $150K - $220K

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume