About Us

At Pruna, we’re on a mission to build the fastest AI Models in the world.

While foundation model labs push the quality frontier by scaling models to become larger, slower, and more expensive, we focus on the efficiency frontier: making AI faster, cheaper, and smaller.

After years of research on efficient ML, we decided that the best way to spread our impact was to take it into our own hands. Each of us cares deeply about empowering people to maximize their impact while minimizing their carbon footprint.

Role Description

As an ML Engineer at Pruna AI, you will work at the cutting edge of model optimization and post-training to build models that currently run over 50 Million times every month. These models are built for both general-purpose and specialized use cases with the main goal of pushing the efficiency frontier.

What You’ll Do

Optimization and Post-Training

Use Pruna’s internal optimization tools to improve model speed and efficiency across different modalities and architectures.
Apply post-training techniques such as reinforcement learning, preference optimization, distillation, and fine-tuning to improve model performance for specific tasks.
Design and run experiments to measure the impact of optimization and post-training methods on quality, latency, throughput, memory usage, and costs.
Collaborate with the research team to test and validate new methods.
Continuously improve deployed models as research and hardware evolve.

Model Building & Deployment

Build & optimize models to be served to millions of users.
Ensure smooth integration into Pruna’s API.
Collaborate with the Software team to scale testing, deployment, and monitoring.
Work closely with customers and users of our optimized models to understand requests for current models and identify opportunities for new ones.
Use customer feedback to guide model development decisions and future improvements.

We Would Love to See

Background

B.Sc., M.Sc., or Ph.D. in Computer Science, Data Science, or a related field or equivalent industry experience.
Exceptional academic performance.
Experience working with Generative AI Models, preferably in the visual domain.

Machine Learning Expertise

Strong foundations in deep learning.
Good understanding of generative modeling.
Expertise in PyTorch and Python.
Familiarity with model deployment workflows such as vLLM or SGLang is a plus.

Engineering and Deployment

Experience taking ML models from research to production in real-world environments.
Experience with containerization tools such as Cog or Docker.

Nice to haves

Familiarity with benchmarking models for both quality and efficiency.
Understanding of performance benchmarking, profiling, and hardware-aware optimization.
Comfort with neo-cloud platforms such as Replicate, Runpod, or Modal.

Personal Attributes

Strong sense of ownership and accountability.
Ability to thrive in ambiguous, fast-moving environments.
Clear communication skills.

Bonus Points

Agentic coding experience.
Experience with compression methods such as quantization, pruning, distillation, or compilation.
Knowledge of lower-level optimization frameworks such as Triton or CUDA.
Prior experience in forward-deployed engineering or customer-facing ML roles.

ML Engineer

Join Pruna AI as an ML Engineer to turn cutting-edge models into fast, efficient, and production-ready AI — making state-of-the-art accessible, affordable, and sustainable.