API Documentation

Token Sieve removes duplicate logs, HTML noise, repeated agent traces and other token waste from your prompts — deterministically, without calling external LLMs. Use it before sending context to Claude, GPT or Gemini.

Introduction

All requests and responses use JSON. Set Content-Type: application/json on every request with a body.

Base URL: https://api.tokensieve.com

The API exposes two core operations: analyze (inspect waste and cost) and trim (clean content and return savings). Use models to browse supported LLMs and live input/output pricing. Analyze and trim require an API key; the models catalog is public.

Quick start

  1. Get a free beta API key — no login or credit card required.
  2. Call GET /v1/models to check whether your LLM is listed and see current input/output pricing (no API key required).
  3. Call POST /v1/analyze with your content to see detected waste and estimated costs.
  4. Call POST /v1/trim to get cleaned content and use trimmed_content in your LLM prompt.
Minimal trim request
curl -X POST https://api.tokensieve.com/v1/trim \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "claude-opus-4-6",
  "mode": "safe",
  "content": "your logs, HTML or agent trace here..."
}'

Authentication

Protected endpoints require a Bearer token in the Authorization header:

Header
Authorization: Bearer YOUR_API_KEY

Beta keys use the prefix ctx_beta_. Store your key securely — it is only shown once when created.

Free beta limits

Limit Value
Requests per month 1,000
Requests per minute (burst) 30
Max content size per request 2,000,000 characters

Monthly limits reset at the start of each calendar month (UTC). Burst limits use a rolling 60-second window.

Endpoints

GET /health

Health check. No authentication required.

Request
curl https://api.tokensieve.com/health
Response 200
{
  "status": "ok",
  "service": "token-sieve-api"
}
POST /v1/public/api-keys

Create a free beta API key. No authentication required.

Public · Rate limited (3 requests per IP per hour)

Request body

Field Type Required Description
email string yes Valid email address
use_case string no How you plan to use the API (max 1,000 characters)
marketing_opt_in boolean no Opt in to launch, pricing, and product announcement emails (default: false)
website string no Honeypot field — leave empty
Request
curl -X POST https://api.tokensieve.com/v1/public/api-keys \
  -H "Content-Type: application/json" \
  -d '{
  "email": "[email protected]",
  "use_case": "Reduce token waste in AI agent logs",
  "marketing_opt_in": false,
  "website": ""
}'
Response 201
{
  "api_key": "ctx_beta_...",
  "email": "[email protected]",
  "plan": "free_beta",
  "message": "Your free beta API key has been created. Store it safely — it will only be shown once."
}
Errors: 409 if an active beta key already exists for this email. 429 if the IP rate limit is exceeded.
GET /v1/models

List supported LLM models with input and output pricing (USD per 1M tokens). Pricing is refreshed from OpenRouter at most once per hour.

Public · No API key required

Query parameters

Parameter Type Required Description
q string no Free-text search by model id or display name
Request
curl "https://api.tokensieve.com/v1/models?q=claude"
Response 200
{
  "cached_at": "2026-07-02T12:00:00Z",
  "source": "openrouter",
  "models": [
    {
      "id": "anthropic/claude-opus-4",
      "name": "Claude Opus 4",
      "context_length": 200000,
      "pricing": {
        "input_per_1m": 15.0,
        "output_per_1m": 75.0,
        "currency": "USD"
      }
    }
  ]
}
GET /v1/models/{model_id}

Look up a single model by id (e.g. anthropic/claude-opus-4) or a short alias used in analyze/trim (e.g. claude-opus-4-6).

Public · No API key required
Request
curl https://api.tokensieve.com/v1/models/anthropic/claude-opus-4
Response 200
{
  "cached_at": "2026-07-02T12:00:00Z",
  "source": "openrouter",
  "model": {
    "id": "anthropic/claude-opus-4",
    "name": "Claude Opus 4",
    "context_length": 200000,
    "pricing": {
      "input_per_1m": 15.0,
      "output_per_1m": 75.0,
      "currency": "USD"
    }
  }
}
Errors: 404 if the model is not in the catalog.
POST /v1/analyze

Analyze content for token waste and estimated LLM costs. Does not modify your content.

Requires API key

Request body

Field Type Required Description
model string yes Target LLM model id or alias — see GET /v1/models for supported models and pricing (e.g. anthropic/claude-opus-4, claude-opus-4-6)
content string yes Text to analyze (logs, HTML, JSON, chat history, etc.)
estimated_output_tokens integer no Expected output tokens for cost estimate (default: 1000)
content_type string no Content type hint (default: auto). See content types.
Request
curl -X POST https://api.tokensieve.com/v1/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "claude-opus-4-6",
  "content": "2024-01-01 INFO Started\n2024-01-01 INFO Started\nActual error details here.",
  "estimated_output_tokens": 2000,
  "content_type": "auto"
}'
Response 200
{
  "model": "claude-opus-4-6",
  "content_type_detected": "logs",
  "tokens": {
    "input_tokens": 42,
    "estimated_output_tokens": 2000,
    "total_tokens_estimated": 2042
  },
  "cost_estimate": {
    "input_usd": 0.0002,
    "output_usd": 0.05,
    "total_usd": 0.0502,
    "currency": "USD",
    "warning": null
  },
  "detected_waste": [
    {
      "type": "duplicate_lines",
      "estimated_tokens": 15,
      "description": "Repeated lines or near-identical log entries detected."
    }
  ],
  "recommendations": [
    "Use /v1/trim with mode=safe before sending this content to the LLM."
  ]
}
POST /v1/trim

Trim content and return cleaned text with before/after token counts and savings.

Requires API key

Request body

Field Type Required Description
model string yes Target LLM model for token counting and cost estimates — use GET /v1/models to verify your model and pricing
content string yes Text to trim
estimated_output_tokens integer no Expected output tokens (default: 1000)
content_type string no Content type hint (default: auto)
mode string no Trim aggressiveness: safe, balanced, or aggressive (default: safe)
Request
curl -X POST https://api.tokensieve.com/v1/trim \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "claude-opus-4-6",
  "content": "2024-01-01 INFO Started\n2024-01-01 INFO Started\n2024-01-01 INFO Started\nActual error details here.",
  "mode": "safe",
  "content_type": "logs"
}'
Response 200
{
  "model": "claude-opus-4-6",
  "content_type_detected": "logs",
  "trimmed_content": "2024-01-01 INFO Started\nActual error details here.",
  "before": {
    "input_tokens": 42,
    "estimated_total_cost_usd": 0.0502
  },
  "after": {
    "input_tokens": 18,
    "estimated_total_cost_usd": 0.0217
  },
  "savings": {
    "tokens_saved": 24,
    "percent": 57.1,
    "estimated_usd_saved": 0.0285
  },
  "actions_taken": [
    {
      "type": "removed_duplicate_lines",
      "tokens_removed_estimate": 24
    }
  ],
  "quality_risk": "low",
  "notes": [
    "Safe mode only removes obvious noise such as duplicate lines, HTML boilerplate and repeated log entries."
  ]
}

Use trimmed_content as the input to your LLM call instead of the original content.

Content types

Set content_type to help the API pick the right trimmers, or use auto to let Token Sieve detect the type from the content.

Value Description
auto Detect automatically from content (recommended)
text Plain text with no special structure
html HTML pages, scraped web content
json JSON payloads, API responses
logs Application or server log output
chat_history Multi-turn chat with user/assistant roles
agent_trace AI agent tool calls, steps and observations
markdown Markdown documents and formatted text

Trim modes

Choose how aggressively Token Sieve removes content. Start with safe and increase only if you need more savings and can accept higher quality risk.

Mode What it does Quality risk
safe Removes duplicate lines, HTML boilerplate, repeated logs and JSON whitespace low
balanced Everything in safe, plus truncation of large repetitive blocks marked with [TRIMMED: ...] medium
aggressive Everything in balanced, plus aggressive middle truncation for blocks over 5,000 characters high

Error handling

All errors return a JSON body with a detail field:

Error response
{
  "detail": "Missing API key. Provide Authorization: Bearer YOUR_API_KEY"
}
Status Cause
400 Invalid request body or validation error
401 Missing Authorization header
403 Invalid or inactive API key
409 Active beta key already exists for this email
413 Content exceeds maximum length (default: 2,000,000 characters)
429 Rate limit, burst limit, or monthly usage limit exceeded
500 Unexpected server error

Integration examples

Typical integration: read content from your app, trim it, then pass trimmed_content to your LLM.

curl

Analyze then trim
# 1. Analyze (optional)
curl -X POST https://api.tokensieve.com/v1/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-opus-4-6", "content": "...", "content_type": "auto"}'

# 2. Trim and use the result
curl -X POST https://api.tokensieve.com/v1/trim \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-opus-4-6", "content": "...", "mode": "safe"}'

Python

Python (requests)
import requests

BASE = "https://api.tokensieve.com"
API_KEY = "ctx_beta_..."

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

content = open("agent_log.txt").read()

# Optional: inspect waste before trimming
analyze = requests.post(
    f"{BASE}/v1/analyze",
    headers=headers,
    json={"model": "claude-opus-4-6", "content": content, "content_type": "auto"},
)
print(analyze.json()["detected_waste"])

# Trim and get cleaned content
trim = requests.post(
    f"{BASE}/v1/trim",
    headers=headers,
    json={"model": "claude-opus-4-6", "content": content, "mode": "safe"},
)
result = trim.json()
clean_content = result["trimmed_content"]

print(f"Saved {result['savings']['tokens_saved']} tokens ({result['savings']['percent']}%)")

# Pass clean_content to your LLM instead of the original content

JavaScript

JavaScript (fetch)
const BASE = "https://api.tokensieve.com";
const API_KEY = "ctx_beta_...";

const headers = {
  Authorization: `Bearer ${API_KEY}`,
  "Content-Type": "application/json",
};

const content = await fs.promises.readFile("agent_log.txt", "utf8");

const trimRes = await fetch(`${BASE}/v1/trim`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    model: "claude-opus-4-6",
    content,
    mode: "safe",
    content_type: "auto",
  }),
});

const result = await trimRes.json();
const cleanContent = result.trimmed_content;

console.log(`Saved ${result.savings.tokens_saved} tokens (${result.savings.percent}%)`);

// Pass cleanContent to your LLM call

Recommended workflow

Analyze Trim Send trimmed_content to LLM
  1. Analyze (optional): Call /v1/analyze to see what waste is detected and how much it costs before trimming.
  2. Trim: Call /v1/trim with mode: "safe" first. Review trimmed_content and quality_risk in the response.
  3. Integrate: Replace your original prompt content with trimmed_content in your existing LLM API call.
  4. Iterate: If savings are too low and quality is acceptable, try balanced or aggressive modes.

Need an API key? Get your free beta key.