Migrate from AWS Batch
If you're running batch embeddings, transcription, OCR, or other GPU-heavy inference on AWS Batch + GPU instances, you can move the inference half to Common Compute in an afternoon. The orchestration patterns map cleanly; only the runtime changes.
Mental model — what maps to what
AWS Batch concepts → Common Compute concepts:
- Job queue → workload (we route to capacity, you don't pick a queue)
- Job definition → workload_id + model_id (declarative; no container to maintain)
- Compute environment → none (we own the fleet — Apple Silicon, sandboxed)
- submit_job → cc.jobs.submit() / openai-compatible call (returns a job id)
- describe_jobs → cc.jobs.get(id) (same shape: state, attempts, result_uri)
- cancel_job / terminate_job → cc.jobs.cancel(id)
- S3 input bucket → R2 (presigned upload URL returned at submit time)
- CloudWatch Logs → /v1/me/logs (filterable by job id, time range)
- IAM role → API key with `inference` scope (per-key, revocable)
- Reserved capacity → priority tier (`batch` is the default)
Path 1 — Drop-in shim (zero refactor)
If your code calls boto3.client('batch').submit_job(...), install the optional shim and change one import line. The shim translates submit_job/describe_jobs/list_jobs/cancel_job/terminate_job into the equivalent Common Compute calls.
The shim is a thin adapter — it does not require boto3 installed and does not call AWS. It returns response dicts shaped exactly like AWS Batch responses (jobId / jobName / status / startedAt / stoppedAt / statusReason).
Path 2 — Native SDK (cleanest)
If you can spend an hour, the native SDK is more idiomatic Python and gives you typed responses, streaming, and async support.
Path 3 — One-call replace (per file)
If you only have a handful of submit_job sites, you can call the SDK directly without the shim. The function-shape is unchanged from any other Python HTTP client.
Video pipelines (the most common AWS Batch + GPU use case)
Video transcoding and AI processing are the workloads that disproportionately drive AWS Batch + GPU instance bills. Common Compute handles them via two workloads: vt_transcode (hardware H.264/HEVC via Apple VideoToolbox) and the AI workload of your choice (whisper_ane for transcription, coreml_vision for frame analysis, mlx_image for generation, vision_ocr for screen-grab text).
AI batch jobs (embeddings, OCR, classification)
If your pipeline already uses AWS Batch to run embeddings or vision over a corpus on g5/p5 instances, the OpenAI-compatible path is even shorter than the shim.
What you don't have to manage anymore
- Container images — no ECR, no Dockerfile, no base-image patches
- Compute environments — no scaling policies, no spot interruption handling
- GPU instance availability — no waiting on p5/g5 reservations
- Cost dashboards — quotes are returned before the job runs
What changes about your pipelines
- S3 → R2: input bytes upload to a presigned URL we return; you don't manage buckets
- IAM → API keys with scopes; rotate via /app/api-keys
- VPC peering → not applicable; we expose a public HTTPS API only
- Failure semantics: tasks retry up to max_attempts (default 3), then dead_letter — same shape as AWS Batch state transitions
Cost comparison (typical)
Embeddings (bge-large) on g5.xlarge through AWS Batch: ~$0.045 per 1M tokens including instance idle. Same workload on Common Compute: $0.009 per 1M tokens. The savings come from amortizing across idle Apple Silicon instead of dedicated cloud GPU.