Hyperparam Documentation
Hyperparam is a browser-native debugger for agent logs, coding logs, and chatbot histories. It reads millions of rows straight from S3, GCS, or Azure, so you can find the failure modes before your users do, identify the prompts that burn tokens, and ship fixes grounded in what actually happened.
Video Overview
Why Hyperparam?
Built for agent and chat traces
- Open Claude Code transcripts, Codex sessions, ChatGPT exports, Langfuse / LangSmith / Phoenix traces, or any JSONL/Parquet of LLM calls
- Drill into nested conversations, tool calls, and reasoning steps without flattening them first
- Correlate failures across sessions, tools, models, and users at dataset scale
Join across all your sources
- Pull traces from local files, S3, GCS, Azure Blob, Hugging Face, and Iceberg tables, and combine them in one workspace
- Join your logs against GitHub repos, issues, and PRs to correlate agent behavior with the code it was running on
- Run SQL across sources to ask questions like "which sessions touched this file?" or "which tool failures happened on this commit?"
Browser-native performance
- Stream multi-gigabyte logs straight from S3, GCS, Azure Blob, or Hugging Face
- HTTP range requests pull only the bytes needed. Credentials stay in the browser.
- Lazy computation processes only what you scroll to, so billion-row tables stay responsive
AI agent that works with you
- Ask in plain language: "which sessions hit the context limit?", "where did the agent loop?", "which tool calls failed and why?"
- Generate derived columns at scale: failure classifications, quality scores, root-cause categories, suggested prompt fixes
- Build SQL views to filter, join, and project across log sources
- Save repeatable analyses as skills so the same workflow runs on next week's logs
Who Uses Hyperparam?
- Agent and coding-tool teams debugging tool failures, wasted calls, and rabbit-holes in production traces
- AI product teams triaging chatbot histories to find user frustration, hallucinations, and quality regressions
- ML and platform engineers turning raw observability logs into actionable fixes for prompts, tools, and routing
- Researchers auditing reasoning traces and model behavior across releases
Getting Started
- Quick start — Load your first log file and run an analysis in under 3 minutes
- Exporting Chat Logs — Pull traces out of Claude Code, ChatGPT, Langfuse, LangSmith, Phoenix, Datadog, and more
- Data Sources — Connect S3, GCS, Azure, or Hugging Face
Use Cases
Each guide walks through a real workflow on agent or chat logs: exploring, surfacing issues, and improving the underlying system.
- How to Debug Wasted Tool Calls in LLM Logs — Separate avoidable tool-call failures from necessary ones, and capture suggested prompt fixes
- Quality Filtering — Score and remove low-quality, sycophantic responses from chat logs
- Classifying Prompt Patterns — Categorize unstructured system prompts to understand what your assistant is actually being asked to do
- Dataset Discovery — Use natural language to find public datasets to benchmark against
- Complete Workflow — End-to-end: extract structured fields, filter, export
- Deep Research — Multi-step AI workflow for comparing model outputs
References
- Glossary — Terms used when debugging agent logs, traces, and tool calls
- FAQ — Common questions about features, limits, and security
- Desktop App — Native app with private cloud access and bring-your-own model keys
Open Source
To build Hyperparam we created an ecosystem of open source libraries for efficient data handling in the browser:
- hightable — High-performance react table for large datasets
- hyparquet — Apache Parquet reader for JavaScript and TypeScript
- squirreling — Async streaming SQL engine in pure JavaScript
- hysnappy — Snappy decompressor optimized with WebAssembly
- icebird — Apache Iceberg table reader in JavaScript
- hyllama — Llama.cpp model parser in JavaScript
The Feedback Loop
Understanding what your agent or chatbot is actually doing in production is the first step to making it better. Hyperparam closes the loop: read raw traces, surface the failure modes, and extract the fixes that improve your prompts, tools, and routing. Rapid iteration on real logs is how great AI products get built.
