Start Building with TokenLabs

Production-ready inference endpoints powered by NVIDIA DGX Spark. OpenAI-compatible API.

🔍

Models

—

From / 1M Input

Inference Endpoints

Select a model to view pricing, try the playground, or grab API code.

Loading models...

Recent Run

Latest overnight observability run results

2026-04-15 · Observability Run

Overnight Observability Run — 15 Tasks

DCGM, OTel, SLO, and failure simulation on DGX Spark GB10. Top finding: chunked prefill is the highest-leverage fix for 44.8s TTFT p95.

SLO Tier 1 ✓ E2E Traces ✗ TTFT p95 44.8s

Overview

Playground

API Reference

Model Specifications

Pricing · per token

Input tokens

per 1M tokens

$0.00

Cached input tokens

per 1M tokens

$0.0000

Output tokens

per 1M tokens

$0.00

Prices derived from CI benchmarks (vLLM on DGX Spark).

Quick links

📈Grafana Monitoring→ 🔗GitHub Repository→

API base URL

Compatible with openai, litellm, langchain.

All Available Models

Model

Primary

System Prompt

Parameters

Temperature0.70

Max Tokens512

Top P1.00

Options

Streaming

Mode: Select a model to compare

💬

Start a conversation

Send a message to chat with the selected model.

Quantization benefits

LLM haiku

FP8 vs NVFP4

About DGX Spark

OpenAI-compatible API

Endpoint

POST

OpenAI-compatible chat completions. Streaming via stream: true.

Model identifier

Code Examples

cURL

Python

Node.js

LiteLLM

Request Parameters

Parameter	Type	Description
model	string	Full model identifier
messages	array	Array of {role, content} objects
max_tokens	integer	Max tokens to generate (default 512)
temperature	float	Sampling temperature 0–2 (default 0.7)
top_p	float	Nucleus sampling (default 1.0)
stream	boolean	Stream via SSE (default false)