Start Building with TokenLabs

Production-ready inference endpoints powered by NVIDIA DGX Spark. OpenAI-compatible API, pay-per-token pricing.

-
Models
128K
Context
$0.006
From / 1M Input
99.9%
Uptime

Inference Endpoints

Select a model to view pricing, try the playground, or grab API code.

Loading models...

Platform

Overview
Playground
API Reference
Model Specifications
Pricing · per token
Input tokens
per 1M tokens
$0.00
Cached input tokens
per 1M tokens
$0.0000
Output tokens
per 1M tokens
$0.00

Prices derived from CI benchmarks (vLLM on DGX Spark). Free tier available.

API base URL

Compatible with openai, litellm, langchain.

All Available Models
Model
Primary
System Prompt
Parameters
Temperature0.70
Max Tokens512
Top P1.00
Options
Streaming
Mode: Select a model to compare
💬
Start a conversation
Send a message to chat with the selected model.
Quantization benefits
LLM haiku
FP8 vs NVFP4
About DGX Spark
Powered by vLLM · OpenAI-compatible API
Endpoint
POST

OpenAI-compatible chat completions. Streaming via stream: true.

Model identifier
Code Examples
cURL
Python
Node.js
LiteLLM
Request Parameters
Parameter Type Description
modelstringFull model identifier
messagesarrayArray of {role, content} objects
max_tokensintegerMax tokens to generate (default 512)
temperaturefloatSampling temperature 0–2 (default 0.7)
top_pfloatNucleus sampling (default 1.0)
streambooleanStream via SSE (default false)