Start Building with TokenLabs

Production-ready inference endpoints powered by NVIDIA DGX Spark. OpenAI-compatible API.

-
Models
From / 1M Input

Inference Endpoints

Select a model to view pricing, try the playground, or grab API code.

Loading models...

Recent Run

Latest overnight observability run results

2026-04-15 · Observability Run
Overnight Observability Run — 15 Tasks
DCGM, OTel, SLO, and failure simulation on DGX Spark GB10. Top finding: chunked prefill is the highest-leverage fix for 44.8s TTFT p95.
SLO Tier 1 ✓ E2E Traces ✗ TTFT p95 44.8s
Overview
Playground
API Reference
Model Specifications
Pricing · per token
Input tokens
per 1M tokens
$0.00
Cached input tokens
per 1M tokens
$0.0000
Output tokens
per 1M tokens
$0.00

Prices derived from CI benchmarks (vLLM on DGX Spark).

API base URL

Compatible with openai, litellm, langchain.

All Available Models
Model
Primary
System Prompt
Parameters
Temperature0.70
Max Tokens512
Top P1.00
Options
Streaming
Mode: Select a model to compare
💬
Start a conversation
Send a message to chat with the selected model.
Quantization benefits
LLM haiku
FP8 vs NVFP4
About DGX Spark
OpenAI-compatible API
Endpoint
POST

OpenAI-compatible chat completions. Streaming via stream: true.

Model identifier
Code Examples
cURL
Python
Node.js
LiteLLM
Request Parameters
Parameter Type Description
modelstringFull model identifier
messagesarrayArray of {role, content} objects
max_tokensintegerMax tokens to generate (default 512)
temperaturefloatSampling temperature 0–2 (default 0.7)
top_pfloatNucleus sampling (default 1.0)
streambooleanStream via SSE (default false)