TokenLabs – Inference Endpoints

Overview

Playground

API Reference

Model Specifications

Pricing · per token

Input tokens

per 1M tokens

$0.00

Cached input tokens

per 1M tokens

$0.0000

Output tokens

per 1M tokens

$0.00

Prices derived from CI benchmarks (vLLM on DGX Spark). Free tier available.

Quick links

⚡Performance Benchmarks→ 🎯Accuracy Evaluation (IFEval)→ 📊Baseline Comparison→

API base URL

Compatible with openai, litellm, langchain.

All Available Models

Model

Primary

System Prompt

Parameters

Temperature0.70

Max Tokens512

Top P1.00

Options

Streaming

Mode: Select a model to compare

💬

Start a conversation

Send a message to chat with the selected model.

Quantization benefits

LLM haiku

FP8 vs NVFP4

About DGX Spark

Endpoint

POST

OpenAI-compatible chat completions. Streaming via stream: true.

Model identifier

Code Examples

cURL

Python

Node.js

LiteLLM

Request Parameters

Parameter	Type	Description
model	string	Full model identifier
messages	array	Array of {role, content} objects
max_tokens	integer	Max tokens to generate (default 512)
temperature	float	Sampling temperature 0–2 (default 0.7)
top_p	float	Nucleus sampling (default 1.0)
stream	boolean	Stream via SSE (default false)

Start Building with TokenLabs

Inference Endpoints

Platform