Common Compute vs Together AI.
Together is a fully-managed LLM inference API with state-of-the-art chat throughput. Common Compute serves transformer chat too — but transparently, at a much higher per-token cost than Together, because Apple Silicon is the wrong hardware for it.
Where we win, where they win.
For pure LLM chat — even small models — Together and Fireworks and Groq are the right answer. Their hardware is purpose-built for transformer inference; ours isn't.
Common Compute is in the catalog for chat so customers can hit one API for everything. But if chat is the bulk of your bill, route it to Together and use us for the other workloads.
We'd rather tell you this up front than dress up Apple-Silicon chat as competitive on a workload it's not.
- Your mix is mostly audio, embeddings, image, or fine-tuned small models.
- You want one API across multiple modalities with predictable per-task cost.
- You're already using us for a non-chat workload and chat volume is small.
- Chat is your primary workload.
- You need flagship open-weight models (Llama, Mixtral) at scale.
- You need fine-tuning.
- You're serving consumer-facing chat where token cost moves your gross margin.
Try us on a real workload.
$5 in credits on signup. No card required. Enough for a real evaluation across the catalog.