ML Engineer (Inference)

PlayAI


Date: 22 hours ago
City: Palo Alto, CA
Salary: $150,000 - $220,000 per year
Contract type: Full time
About Us:

PlayAI (fka PlayHT, YC '23) is at the forefront of generative voice and conversational LLMs, reshaping how humans interact with technology. Our advanced Speech Synthesis and Voice Cloning models power hyperrealistic, human-like conversational experiences across industries.

We’re building the core infrastructure for conversational AI—enabling businesses, developers, and creators to easily build intelligent voice agents and interactive voice applications. Whether it's serving customers or powering creative projects, PlayAI helps bring talking, human-like AI to life.

Since finishing Y Combinator’s W23 batch, we’ve raised over $21M in seed funding, grown to 1.4M+ monthly active users and 500K+ developers, and are scaling revenue at 35% quarter-over-quarter.

What are we looking for?

We are in search of Machine Learning Engineers who are passionate about solving challenging problems in multimodal (voice, text, etc.) foundational model inference and enabling revolutionary experience in human-AI interaction. By joining our team, you have the opportunity to be a founding engineer and play a pivotal role in shaping the future of Conversational AI. If you're keen on pushing AI boundaries and making a significant impact, this role is for you.

Responsibilities:

  • Designing, building and optimizing multimodal foundational mode inference frameworks.
  • Inventing and implementing novel algorithms and features for inferencing streaming multimodal models..
  • Co-designing next generation multimodal foundation model architectures with the training team to hit the pareto frontier of quality and efficiency.

Qualifications:

  • Demonstrates a growth mindset and a passion for solving challenging problems.
  • Possesses previous academic or work experience in:
    • LLM inference frameworks (TensorRT-LLM, vLLM, SGLang, etc.)
    • LLM Inference algorithms (quantization, sparse attention, speculative sampling, etc.)
    • LLM architecture and training (parallelization, MoE, etc.) is a plus
    • Low level implementations (Flash Attention, GEMM, etc.) is a plus
    • General machine learning (diffusion, GAN, etc.) is a plus
  • Experience with Pytorch, Python and C++. Mastery in CUDA is a plus.
  • Master's degree in a related technical field or Bachelor's degree from a top-tier university with relevant work experience (internships, full-time roles, or equivalent). Recent graduates and current students are encouraged to apply.

What We Offer:

  • Challenging problems to solve.
  • Autonomous working environment
  • Competitive compensation
  • Flexible work hours
  • Health, dental, and vision insurance
  • Commuter benefits
  • Flexible PTO + holidays

Final offer amounts are determined by multiple factors, including experience, and may vary from the amounts listed above.

Compensation Range: $150K - $220K

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume