Health check. No authentication required.
curl https://api.tokensieve.com/health
{
"status": "ok",
"service": "token-sieve-api"
}
Token Sieve removes duplicate logs, HTML noise, repeated agent traces and other token waste from your prompts — deterministically, without calling external LLMs. Use it before sending context to Claude, GPT or Gemini.
All requests and responses use JSON. Set Content-Type: application/json on
every request with a body.
Base URL: https://api.tokensieve.com
The API exposes two core operations: analyze (inspect waste and cost) and trim (clean content and return savings). Use models to browse supported LLMs and live input/output pricing. Analyze and trim require an API key; the models catalog is public.
GET /v1/models to check whether your LLM is listed and see current
input/output pricing (no API key required).
POST /v1/analyze with your content to see detected waste and estimated costs.
POST /v1/trim to get cleaned content and use
trimmed_content in your LLM prompt.
curl -X POST https://api.tokensieve.com/v1/trim \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-6",
"mode": "safe",
"content": "your logs, HTML or agent trace here..."
}'
Protected endpoints require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_API_KEY
Beta keys use the prefix ctx_beta_. Store your key securely — it is only
shown once when created.
| Limit | Value |
|---|---|
| Requests per month | 1,000 |
| Requests per minute (burst) | 30 |
| Max content size per request | 2,000,000 characters |
Monthly limits reset at the start of each calendar month (UTC). Burst limits use a rolling 60-second window.
Health check. No authentication required.
curl https://api.tokensieve.com/health
{
"status": "ok",
"service": "token-sieve-api"
}
Create a free beta API key. No authentication required.
Public · Rate limited (3 requests per IP per hour)| Field | Type | Required | Description |
|---|---|---|---|
email |
string | yes | Valid email address |
use_case |
string | no | How you plan to use the API (max 1,000 characters) |
marketing_opt_in |
boolean | no | Opt in to launch, pricing, and product announcement emails (default: false) |
website |
string | no | Honeypot field — leave empty |
curl -X POST https://api.tokensieve.com/v1/public/api-keys \
-H "Content-Type: application/json" \
-d '{
"email": "[email protected]",
"use_case": "Reduce token waste in AI agent logs",
"marketing_opt_in": false,
"website": ""
}'
{
"api_key": "ctx_beta_...",
"email": "[email protected]",
"plan": "free_beta",
"message": "Your free beta API key has been created. Store it safely — it will only be shown once."
}
List supported LLM models with input and output pricing (USD per 1M tokens). Pricing is refreshed from OpenRouter at most once per hour.
Public · No API key required| Parameter | Type | Required | Description |
|---|---|---|---|
q |
string | no | Free-text search by model id or display name |
curl "https://api.tokensieve.com/v1/models?q=claude"
{
"cached_at": "2026-07-02T12:00:00Z",
"source": "openrouter",
"models": [
{
"id": "anthropic/claude-opus-4",
"name": "Claude Opus 4",
"context_length": 200000,
"pricing": {
"input_per_1m": 15.0,
"output_per_1m": 75.0,
"currency": "USD"
}
}
]
}
Look up a single model by id (e.g. anthropic/claude-opus-4) or a short
alias used in analyze/trim (e.g. claude-opus-4-6).
curl https://api.tokensieve.com/v1/models/anthropic/claude-opus-4
{
"cached_at": "2026-07-02T12:00:00Z",
"source": "openrouter",
"model": {
"id": "anthropic/claude-opus-4",
"name": "Claude Opus 4",
"context_length": 200000,
"pricing": {
"input_per_1m": 15.0,
"output_per_1m": 75.0,
"currency": "USD"
}
}
}
Analyze content for token waste and estimated LLM costs. Does not modify your content.
Requires API key| Field | Type | Required | Description |
|---|---|---|---|
model |
string | yes | Target LLM model id or alias — see GET /v1/models for supported models and pricing (e.g. anthropic/claude-opus-4, claude-opus-4-6) |
content |
string | yes | Text to analyze (logs, HTML, JSON, chat history, etc.) |
estimated_output_tokens |
integer | no | Expected output tokens for cost estimate (default: 1000) |
content_type |
string | no | Content type hint (default: auto). See content types. |
curl -X POST https://api.tokensieve.com/v1/analyze \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-6",
"content": "2024-01-01 INFO Started\n2024-01-01 INFO Started\nActual error details here.",
"estimated_output_tokens": 2000,
"content_type": "auto"
}'
{
"model": "claude-opus-4-6",
"content_type_detected": "logs",
"tokens": {
"input_tokens": 42,
"estimated_output_tokens": 2000,
"total_tokens_estimated": 2042
},
"cost_estimate": {
"input_usd": 0.0002,
"output_usd": 0.05,
"total_usd": 0.0502,
"currency": "USD",
"warning": null
},
"detected_waste": [
{
"type": "duplicate_lines",
"estimated_tokens": 15,
"description": "Repeated lines or near-identical log entries detected."
}
],
"recommendations": [
"Use /v1/trim with mode=safe before sending this content to the LLM."
]
}
Trim content and return cleaned text with before/after token counts and savings.
Requires API key| Field | Type | Required | Description |
|---|---|---|---|
model |
string | yes | Target LLM model for token counting and cost estimates — use GET /v1/models to verify your model and pricing |
content |
string | yes | Text to trim |
estimated_output_tokens |
integer | no | Expected output tokens (default: 1000) |
content_type |
string | no | Content type hint (default: auto) |
mode |
string | no | Trim aggressiveness: safe, balanced, or aggressive (default: safe) |
curl -X POST https://api.tokensieve.com/v1/trim \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-6",
"content": "2024-01-01 INFO Started\n2024-01-01 INFO Started\n2024-01-01 INFO Started\nActual error details here.",
"mode": "safe",
"content_type": "logs"
}'
{
"model": "claude-opus-4-6",
"content_type_detected": "logs",
"trimmed_content": "2024-01-01 INFO Started\nActual error details here.",
"before": {
"input_tokens": 42,
"estimated_total_cost_usd": 0.0502
},
"after": {
"input_tokens": 18,
"estimated_total_cost_usd": 0.0217
},
"savings": {
"tokens_saved": 24,
"percent": 57.1,
"estimated_usd_saved": 0.0285
},
"actions_taken": [
{
"type": "removed_duplicate_lines",
"tokens_removed_estimate": 24
}
],
"quality_risk": "low",
"notes": [
"Safe mode only removes obvious noise such as duplicate lines, HTML boilerplate and repeated log entries."
]
}
Use trimmed_content as the input to your LLM call instead of the original
content.
Set content_type to help the API pick the right trimmers, or use
auto to let Token Sieve detect the type from the content.
| Value | Description |
|---|---|
auto |
Detect automatically from content (recommended) |
text |
Plain text with no special structure |
html |
HTML pages, scraped web content |
json |
JSON payloads, API responses |
logs |
Application or server log output |
chat_history |
Multi-turn chat with user/assistant roles |
agent_trace |
AI agent tool calls, steps and observations |
markdown |
Markdown documents and formatted text |
Choose how aggressively Token Sieve removes content. Start with safe and
increase only if you need more savings and can accept higher quality risk.
| Mode | What it does | Quality risk |
|---|---|---|
safe |
Removes duplicate lines, HTML boilerplate, repeated logs and JSON whitespace | low |
balanced |
Everything in safe, plus truncation of large repetitive blocks marked with [TRIMMED: ...] |
medium |
aggressive |
Everything in balanced, plus aggressive middle truncation for blocks over 5,000 characters | high |
All errors return a JSON body with a detail field:
{
"detail": "Missing API key. Provide Authorization: Bearer YOUR_API_KEY"
}
| Status | Cause |
|---|---|
400 |
Invalid request body or validation error |
401 |
Missing Authorization header |
403 |
Invalid or inactive API key |
409 |
Active beta key already exists for this email |
413 |
Content exceeds maximum length (default: 2,000,000 characters) |
429 |
Rate limit, burst limit, or monthly usage limit exceeded |
500 |
Unexpected server error |
Typical integration: read content from your app, trim it, then pass trimmed_content to your LLM.
# 1. Analyze (optional)
curl -X POST https://api.tokensieve.com/v1/analyze \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "claude-opus-4-6", "content": "...", "content_type": "auto"}'
# 2. Trim and use the result
curl -X POST https://api.tokensieve.com/v1/trim \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "claude-opus-4-6", "content": "...", "mode": "safe"}'
import requests
BASE = "https://api.tokensieve.com"
API_KEY = "ctx_beta_..."
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
content = open("agent_log.txt").read()
# Optional: inspect waste before trimming
analyze = requests.post(
f"{BASE}/v1/analyze",
headers=headers,
json={"model": "claude-opus-4-6", "content": content, "content_type": "auto"},
)
print(analyze.json()["detected_waste"])
# Trim and get cleaned content
trim = requests.post(
f"{BASE}/v1/trim",
headers=headers,
json={"model": "claude-opus-4-6", "content": content, "mode": "safe"},
)
result = trim.json()
clean_content = result["trimmed_content"]
print(f"Saved {result['savings']['tokens_saved']} tokens ({result['savings']['percent']}%)")
# Pass clean_content to your LLM instead of the original content
const BASE = "https://api.tokensieve.com";
const API_KEY = "ctx_beta_...";
const headers = {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
};
const content = await fs.promises.readFile("agent_log.txt", "utf8");
const trimRes = await fetch(`${BASE}/v1/trim`, {
method: "POST",
headers,
body: JSON.stringify({
model: "claude-opus-4-6",
content,
mode: "safe",
content_type: "auto",
}),
});
const result = await trimRes.json();
const cleanContent = result.trimmed_content;
console.log(`Saved ${result.savings.tokens_saved} tokens (${result.savings.percent}%)`);
// Pass cleanContent to your LLM call
/v1/analyze to see what waste
is detected and how much it costs before trimming.
/v1/trim with mode: "safe" first.
Review trimmed_content and quality_risk in the response.
trimmed_content in your existing LLM API call.
balanced or aggressive modes.
Need an API key? Get your free beta key.