LLM inference

Common Compute vs Together AI.

Together is a fully-managed LLM inference API with state-of-the-art chat throughput. Common Compute serves transformer chat too — but transparently, at a much higher per-token cost than Together, because Apple Silicon is the wrong hardware for it.

Last reviewed 2026-05-13

Common Compute
Together AI

Optimised for

Specific workloads where Apple Silicon shines (audio, embeddings, image, small LLMs).

LLM chat at scale. H100-grade throughput on flagship open-weight models.

Chat per-token cost

Available in the catalog but priced honestly — much higher than Together's market rate.

Best-in-class for open-weight chat models.

Embeddings

Cheap (~$0.02 / M tokens via the embeddings runner).

Available, slightly pricier.

Audio (Whisper)

Cheap. ANE-accelerated on M-series.

Not the primary use case.

Image generation (SDXL)

Available on 32GB+ Macs. Solid throughput, batchable.

Available; competitive pricing.

Fine-tuning

Not supported.

Yes — managed LoRA on supported models.

Hosted multi-modal models

Limited to the catalog.

Wide catalog of open-weight models, served identically.

Honest take

Where we win, where they win.

For pure LLM chat — even small models — Together and Fireworks and Groq are the right answer. Their hardware is purpose-built for transformer inference; ours isn't.

Common Compute is in the catalog for chat so customers can hit one API for everything. But if chat is the bulk of your bill, route it to Together and use us for the other workloads.

We'd rather tell you this up front than dress up Apple-Silicon chat as competitive on a workload it's not.

Pick Common Compute when…

Your mix is mostly audio, embeddings, image, or fine-tuned small models.
You want one API across multiple modalities with predictable per-task cost.
You're already using us for a non-chat workload and chat volume is small.

Pick Together AI when…

Chat is your primary workload.
You need flagship open-weight models (Llama, Mixtral) at scale.
You need fine-tuning.
You're serving consumer-facing chat where token cost moves your gross margin.

Try us on a real workload.

$5 in credits on signup. No card required. Enough for a real evaluation across the catalog.

Get started See pricing Read the docs