Documentation — Token Sieve API

Introduction

All requests and responses use JSON. Set Content-Type: application/json on every request with a body.

Base URL: https://api.tokensieve.com

The API exposes two core operations: analyze (inspect waste and cost) and trim (clean content and return savings). Use models to browse supported LLMs and live input/output pricing. Analyze and trim require an API key; the models catalog is public.

Quick start

Get a free beta API key — no login or credit card required.
Call GET /v1/models to check whether your LLM is listed and see current input/output pricing (no API key required).
Call POST /v1/analyze with your content to see detected waste and estimated costs.
Call POST /v1/trim to get cleaned content and use trimmed_content in your LLM prompt.

Minimal trim request

curl -X POST https://api.tokensieve.com/v1/trim \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "claude-opus-4-6",
  "mode": "safe",
  "content": "your logs, HTML or agent trace here..."
}'

Authentication

Protected endpoints require a Bearer token in the Authorization header:

Header

Authorization: Bearer YOUR_API_KEY

Beta keys use the prefix ctx_beta_. Store your key securely — it is only shown once when created.

Free beta limits

Limit	Value
Requests per month	1,000
Requests per minute (burst)	30
Max content size per request	2,000,000 characters

Monthly limits reset at the start of each calendar month (UTC). Burst limits use a rolling 60-second window.

Endpoints

GET /health

Health check. No authentication required.

Request

curl https://api.tokensieve.com/health

Response 200

{
  "status": "ok",
  "service": "token-sieve-api"
}

POST /v1/public/api-keys

Create a free beta API key. No authentication required.

Public · Rate limited (3 requests per IP per hour)

Request body

Field	Type	Required	Description
`email`	string	yes	Valid email address
`use_case`	string	no	How you plan to use the API (max 1,000 characters)
`marketing_opt_in`	boolean	no	Opt in to launch, pricing, and product announcement emails (default: false)
`website`	string	no	Honeypot field — leave empty

Request

curl -X POST https://api.tokensieve.com/v1/public/api-keys \
  -H "Content-Type: application/json" \
  -d '{
  "email": "[email protected]",
  "use_case": "Reduce token waste in AI agent logs",
  "marketing_opt_in": false,
  "website": ""
}'

Response 201

{
  "api_key": "ctx_beta_...",
  "email": "[email protected]",
  "plan": "free_beta",
  "message": "Your free beta API key has been created. Store it safely — it will only be shown once."
}

Errors: 409 if an active beta key already exists for this email. 429 if the IP rate limit is exceeded.

GET /v1/models

List supported LLM models with input and output pricing (USD per 1M tokens). Pricing is refreshed from OpenRouter at most once per hour.

Public · No API key required

Query parameters

Parameter	Type	Required	Description
`q`	string	no	Free-text search by model id or display name

Request

curl "https://api.tokensieve.com/v1/models?q=claude"

Response 200

{
  "cached_at": "2026-07-02T12:00:00Z",
  "source": "openrouter",
  "models": [
    {
      "id": "anthropic/claude-opus-4",
      "name": "Claude Opus 4",
      "context_length": 200000,
      "pricing": {
        "input_per_1m": 15.0,
        "output_per_1m": 75.0,
        "currency": "USD"
      }
    }
  ]
}

GET /v1/models/{model_id}

Look up a single model by id (e.g. anthropic/claude-opus-4) or a short alias used in analyze/trim (e.g. claude-opus-4-6).

Public · No API key required

Request

curl https://api.tokensieve.com/v1/models/anthropic/claude-opus-4

Response 200

{
  "cached_at": "2026-07-02T12:00:00Z",
  "source": "openrouter",
  "model": {
    "id": "anthropic/claude-opus-4",
    "name": "Claude Opus 4",
    "context_length": 200000,
    "pricing": {
      "input_per_1m": 15.0,
      "output_per_1m": 75.0,
      "currency": "USD"
    }
  }
}

Errors: 404 if the model is not in the catalog.

POST /v1/analyze

Analyze content for token waste and estimated LLM costs. Does not modify your content.

Requires API key

Request body

Field	Type	Required	Description
`model`	string	yes	Target LLM model id or alias — see GET /v1/models for supported models and pricing (e.g. `anthropic/claude-opus-4`, `claude-opus-4-6`)
`content`	string	yes	Text to analyze (logs, HTML, JSON, chat history, etc.)
`estimated_output_tokens`	integer	no	Expected output tokens for cost estimate (default: 1000)
`content_type`	string	no	Content type hint (default: `auto`). See content types.

Request

curl -X POST https://api.tokensieve.com/v1/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "claude-opus-4-6",
  "content": "2024-01-01 INFO Started\n2024-01-01 INFO Started\nActual error details here.",
  "estimated_output_tokens": 2000,
  "content_type": "auto"
}'

Response 200

{
  "model": "claude-opus-4-6",
  "content_type_detected": "logs",
  "tokens": {
    "input_tokens": 42,
    "estimated_output_tokens": 2000,
    "total_tokens_estimated": 2042
  },
  "cost_estimate": {
    "input_usd": 0.0002,
    "output_usd": 0.05,
    "total_usd": 0.0502,
    "currency": "USD",
    "warning": null
  },
  "detected_waste": [
    {
      "type": "duplicate_lines",
      "estimated_tokens": 15,
      "description": "Repeated lines or near-identical log entries detected."
    }
  ],
  "recommendations": [
    "Use /v1/trim with mode=safe before sending this content to the LLM."
  ]
}

POST /v1/trim

Trim content and return cleaned text with before/after token counts and savings.

Requires API key

Request body

Field	Type	Required	Description
`model`	string	yes	Target LLM model for token counting and cost estimates — use GET /v1/models to verify your model and pricing
`content`	string	yes	Text to trim
`estimated_output_tokens`	integer	no	Expected output tokens (default: 1000)
`content_type`	string	no	Content type hint (default: `auto`)
`mode`	string	no	Trim aggressiveness: `safe`, `balanced`, or `aggressive` (default: `safe`)

Request

curl -X POST https://api.tokensieve.com/v1/trim \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "claude-opus-4-6",
  "content": "2024-01-01 INFO Started\n2024-01-01 INFO Started\n2024-01-01 INFO Started\nActual error details here.",
  "mode": "safe",
  "content_type": "logs"
}'

Response 200

{
  "model": "claude-opus-4-6",
  "content_type_detected": "logs",
  "trimmed_content": "2024-01-01 INFO Started\nActual error details here.",
  "before": {
    "input_tokens": 42,
    "estimated_total_cost_usd": 0.0502
  },
  "after": {
    "input_tokens": 18,
    "estimated_total_cost_usd": 0.0217
  },
  "savings": {
    "tokens_saved": 24,
    "percent": 57.1,
    "estimated_usd_saved": 0.0285
  },
  "actions_taken": [
    {
      "type": "removed_duplicate_lines",
      "tokens_removed_estimate": 24
    }
  ],
  "quality_risk": "low",
  "notes": [
    "Safe mode only removes obvious noise such as duplicate lines, HTML boilerplate and repeated log entries."
  ]
}

Use trimmed_content as the input to your LLM call instead of the original content.

Content types

Set content_type to help the API pick the right trimmers, or use auto to let Token Sieve detect the type from the content.

Value	Description
`auto`	Detect automatically from content (recommended)
`text`	Plain text with no special structure
`html`	HTML pages, scraped web content
`json`	JSON payloads, API responses
`logs`	Application or server log output
`chat_history`	Multi-turn chat with user/assistant roles
`agent_trace`	AI agent tool calls, steps and observations
`markdown`	Markdown documents and formatted text

Trim modes

Choose how aggressively Token Sieve removes content. Start with safe and increase only if you need more savings and can accept higher quality risk.

Mode	What it does	Quality risk
`safe`	Removes duplicate lines, HTML boilerplate, repeated logs and JSON whitespace	low
`balanced`	Everything in safe, plus truncation of large repetitive blocks marked with `[TRIMMED: ...]`	medium
`aggressive`	Everything in balanced, plus aggressive middle truncation for blocks over 5,000 characters	high

Error handling

All errors return a JSON body with a detail field:

Error response

{
  "detail": "Missing API key. Provide Authorization: Bearer YOUR_API_KEY"
}

Status	Cause
`400`	Invalid request body or validation error
`401`	Missing `Authorization` header
`403`	Invalid or inactive API key
`409`	Active beta key already exists for this email
`413`	Content exceeds maximum length (default: 2,000,000 characters)
`429`	Rate limit, burst limit, or monthly usage limit exceeded
`500`	Unexpected server error

Integration examples

Typical integration: read content from your app, trim it, then pass trimmed_content to your LLM.

curl

Analyze then trim

# 1. Analyze (optional)
curl -X POST https://api.tokensieve.com/v1/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-opus-4-6", "content": "...", "content_type": "auto"}'

# 2. Trim and use the result
curl -X POST https://api.tokensieve.com/v1/trim \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-opus-4-6", "content": "...", "mode": "safe"}'

Python

Python (requests)

import requests

BASE = "https://api.tokensieve.com"
API_KEY = "ctx_beta_..."

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

content = open("agent_log.txt").read()

# Optional: inspect waste before trimming
analyze = requests.post(
    f"{BASE}/v1/analyze",
    headers=headers,
    json={"model": "claude-opus-4-6", "content": content, "content_type": "auto"},
)
print(analyze.json()["detected_waste"])

# Trim and get cleaned content
trim = requests.post(
    f"{BASE}/v1/trim",
    headers=headers,
    json={"model": "claude-opus-4-6", "content": content, "mode": "safe"},
)
result = trim.json()
clean_content = result["trimmed_content"]

print(f"Saved {result['savings']['tokens_saved']} tokens ({result['savings']['percent']}%)")

# Pass clean_content to your LLM instead of the original content

JavaScript

JavaScript (fetch)

const BASE = "https://api.tokensieve.com";
const API_KEY = "ctx_beta_...";

const headers = {
  Authorization: `Bearer ${API_KEY}`,
  "Content-Type": "application/json",
};

const content = await fs.promises.readFile("agent_log.txt", "utf8");

const trimRes = await fetch(`${BASE}/v1/trim`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    model: "claude-opus-4-6",
    content,
    mode: "safe",
    content_type: "auto",
  }),
});

const result = await trimRes.json();
const cleanContent = result.trimmed_content;

console.log(`Saved ${result.savings.tokens_saved} tokens (${result.savings.percent}%)`);

// Pass cleanContent to your LLM call

Recommended workflow

Analyze → Trim → Send trimmed_content to LLM

Analyze (optional): Call /v1/analyze to see what waste is detected and how much it costs before trimming.
Trim: Call /v1/trim with mode: "safe" first. Review trimmed_content and quality_risk in the response.
Integrate: Replace your original prompt content with trimmed_content in your existing LLM API call.
Iterate: If savings are too low and quality is acceptable, try balanced or aggressive modes.

Need an API key? Get your free beta key.

API Documentation

Introduction

Quick start

Authentication

Free beta limits

Endpoints

Request body

Query parameters

Request body

Request body

Content types

Trim modes

Error handling

Integration examples

curl

Python

JavaScript

Recommended workflow