ML-Infrastructure Engineer

Coval

Coval

Software Engineering, Other Engineering, Data Science

San Francisco, CA, USA

Posted on Apr 17, 2026

Coval

Simulation & Evaluation that scales voice and chat AI agents

ML-Infrastructure Engineer

$100K - $200K0.20% - 1.00%San Francisco, CA, US
Job type
Full-time
Role
Engineering, Backend
Experience
1+ years
Visa
US citizen/visa only
Skills
Torch/PyTorch, Python, CUDA, Amazon Web Services (AWS)
Connect directly with founders of the best YC-funded startups.
Apply to role ›
Brooke Hopkins
Founder
Brooke Hopkins
Founder

About the role

THE ROLE

Every simulation we run touches multiple models (LLMs, speech-to-text, text-to-speech), and our Fortune 500 customers need hundreds, sometimes thousands of these running concurrently. Making that fast, reliable, and cost-efficient is the job.

We've built the skeleton. Our team has done this before: running single-digit-percent-of-Google-scale compute at Waymo for massive workloads. The auto-scaling foundations, the queuing systems, the monitoring patterns are in place. But we're at an inflection point — demand is growing fast and there's a ton of low-hanging fruit: optimizing how many workloads run on a single machine, tuning scaling algorithms, deciding what to self-host versus what to keep as managed services.

You'll own our model infrastructure end to end:

• Scaling GPU and compute infrastructure. Architect and operate the auto-scaling systems that handle spikes of hundreds to thousands of concurrent simulations. Optimize how we provision, schedule, and monitor GPU instances.

• Making the hosting decisions. We use a mix of closed-source hosted models and open-source self-hosted models today. You'll evaluate the tradeoffs (cost, latency, quality) and make the calls on what to host, where, and how it connects to the rest of our pipeline.

• Making our pipelines go fast. You're obsessive about performance. You want to know exactly what compute we're using, where the bottlenecks are, and how to squeeze more throughput out of every machine. You live in monitoring dashboards and you love it.

• Staying on the frontier of models. Voice AI models are getting commoditized, and we get to experiment with all of them. You'll benchmark the latest models across the full voice stack, run comparisons, and help us stay ahead of what's coming next.

What makes this more interesting than a similar role at a bigger AI company: you're not scoped to a narrow set of tasks. You'll develop and architect large parts of our compute infrastructure, and you'll shape the decisions about which models we run and how.

WHAT WE'RE LOOKING FOR

• You've built and operated auto-scaling infrastructure for compute-heavy workloads, ideally involving GPUs and model serving.

• You're a hardware nerd at heart. You care about what instances we're running, how scaling policies are tuned, and whether we're leaving performance on the table.

• You're obsessive about monitoring and observability. You want to know when something is degrading before it becomes an incident.

• You can make pragmatic calls on build-vs-buy, self-host-vs-managed, open-source-vs-closed. You're excited about the latest open-source models but you know when paying for a service is the right move.

• You're curious about the full voice AI model stack (LLMs, STT, TTS) and you want to be immersed in how these models evolve month to month.

• You want to shape infrastructure at a company where the decisions aren't already made for you.

WHAT YOU'LL WORK WITH

You'll work in Python, building and operating auto-scaling compute infrastructure on AWS with GPU instances, containerized deployments, and modern observability tooling. You'll work across both self-hosted open-source models and managed API services.

About Coval

What Coval Does

Coval is the simulation and evaluation platform for voice AI. We help companies answer the question most can't: do their voice agents actually work? Not in a demo. At scale, in production, with real users.

Most teams building voice agents are flying blind. They ship a demo that works, deploy it, and discover weeks later that 40% of conversations are failing. No evaluation infrastructure. No way to catch regressions before users do.

We built Coval because we've seen this movie before. Brooke led evaluation infrastructure at Waymo, where it takes millions of simulated miles before a vehicle touches a public road. Voice agents need the same rigor. Right now, almost nobody has it.

Nine people, backed by YC, closing six-figure enterprise deals with Fortune 500 companies, growing revenue 10x year over year. The space is moving fast. We're at the center of it.

What It's Like Here

We work in-person in SoMa, San Francisco. The office is shoes-off, full of plants, and dog-friendly. We bike, hike, skate, and run half-marathons together.

The team comes from Waymo, Zoox, Apple, and Google. We're hard on our work but never hard on each other. We ship on Sundays because we want to, not because someone told us to. We move at AI speed: what used to take weeks happens in hours.

Our operating principle is "wholesome and unhinged." We are relentless about making things happen, but we don't take ourselves too seriously. If you thrive in ambiguity, move fast, and want to be one of the first people building the sales foundations at a company that's already closing Fortune 500 deals, this is it.

Voice AI is where the market is going. We're already there. This is the best time to join.