Careers

Discover opportunities in our family of incredible companies and people.
MaC Venture Capital
companies
Jobs

AI-Infrastructure-Engineer

Cosmic Labs

Cosmic Labs

Software Engineering, Other Engineering, Data Science
Posted on Dec 31, 2025

AI-Infrastructure-Engineer

Location: San Francisco (Onsite)
Type: Full-time
Start Date: ASAP


What You'll Do

  • Design and build infrastructure for deploying, scaling, and managing AI/ML workloads
  • Develop automation for GPU cluster provisioning, configuration, and orchestration
  • Build systems for hardware-aware model deployment and inference optimization
  • Create tooling for AI infrastructure observability, debugging, and performance tuning
  • Work on integration between hardware intelligence and ML frameworks
  • Collaborate with customers deploying large-scale AI systems in production
  • Optimize resource utilization across heterogeneous compute (GPUs, TPUs, custom accelerators)

What You Bring

Strong experience with:

  • GPU cluster management and orchestration (SLURM, Kubernetes, Ray)
  • ML infrastructure and frameworks (PyTorch, TensorFlow, JAX, NVIDIA stack)
  • Distributed training and inference systems
  • Container orchestration for ML workloads (Docker, Kubernetes, KubeFlow)
  • Linux systems programming and performance optimization
  • Python and systems scripting

Familiarity with:

  • Hardware architectures for AI (NVIDIA GPUs, AMD GPUs, custom accelerators)
  • High-performance networking for distributed ML (NCCL, InfiniBand, RoCE)
  • Model serving infrastructure (Triton, vLLM, TensorRT)
  • Storage systems for ML workloads (distributed filesystems, object storage)
  • Infrastructure as Code and GitOps workflows

What We're Looking For

We're looking for an AI infrastructure engineer who understands the full stack from silicon to model serving — and can build systems that make AI deployment effortless.

You should have:

  • Deep understanding of what it takes to run AI workloads at scale
  • Experience with the operational challenges of GPU clusters and ML infrastructure
  • Ability to debug performance issues across hardware, networking, and software
  • Comfort working across infrastructure, ML frameworks, and developer experience
  • Excitement about building the foundational layer for physical AI systems

Requirements:

  • Bachelor's or Master's in Computer Science, Computer Engineering, or equivalent experience
  • 3+ years of experience in ML infrastructure, MLOps, or AI platform engineering
  • Willingness to work startup hours, in-person (weekends included) at our San Francisco office
  • Work authorization in the United States

Why Join

We're building the intelligence layer for hardware — real-time systems that control physical machines with zero tolerance for latency or failure.

What we offer:

  • Startup-level equity and highly competitive salary
  • Ownership over AI infrastructure that powers next-generation systems
  • Problems at the intersection of hardware intelligence and machine learning
  • Close collaboration with customers pushing the boundaries of AI deployment

How to Apply

Email: team@cosmiclabs.io
Subject line: AI Infrastructure / [Your Name]

Include in your email:

  1. Your name
  2. Why this role and why Cosmic Labs
  3. What you bring technically
  4. Soonest available start date
  5. GitHub or GitLab link
  6. Confirmation of work authorization in the U.S.
  7. Confirmation of willingness to work full-time, in-person in San Francisco

Attach: PDF resume