Production-ready inference endpoints powered by NVIDIA DGX Spark. OpenAI-compatible API.
Select a model to view pricing, try the playground, or grab API code.
Latest overnight observability run results
Prices derived from CI benchmarks (vLLM on DGX Spark).
Compatible with openai, litellm, langchain.
OpenAI-compatible chat completions. Streaming via stream: true.
| Parameter | Type | Description |
|---|---|---|
| model | string | Full model identifier |
| messages | array | Array of {role, content} objects |
| max_tokens | integer | Max tokens to generate (default 512) |
| temperature | float | Sampling temperature 0–2 (default 0.7) |
| top_p | float | Nucleus sampling (default 1.0) |
| stream | boolean | Stream via SSE (default false) |