Autoresearch: AI-driven experimentation loops for continuous optimization
by @gregeisenberg
ABOUT THIS SKILL
Autoresearch is Andrej Karpathy's open-source framework that turns an AI agent into a tireless research intern, running thousands of micro-experiments overnight to iteratively improve code, models, or business metrics while you sleep.
TECHNIQUES
KEY PRINCIPLES (15)
State the single metric you want optimized in plain language.
Examples: “make this small AI model smarter,” “cheaper leads,” “higher sales.” The agent needs a clear, measurable target to guide every experiment.
Why: A crisp objective prevents drift and lets the loop decide which variants to keep or discard.
"you give it a goal, the AI agent does a thing, you tell the AI what better means"
Plan → Act → Read → Update is the atomic loop.
The agent plans a change, edits code or settings, runs a short GPU job (~5 min), reads the metric, then updates the plan and repeats.
Why: Rapid iteration compounds small gains into large improvements without human bottlenecks.
"it plans, it acts, it reads results, it updates the plan"
Only the winning configurations survive.
If the metric improves, the config is saved; if not, it is logged and discarded. Over time this evolutionary pressure yields the best variant.
Why: Mimics natural selection to converge on optimal settings without exhaustive grid search.
"it only saves the changes that improve"
An Nvidia GPU (or rented cloud GPU) is mandatory.
Tested on H100; other Nvidia chips work. Apple Silicon or non-GPU machines cannot run the training loop.
Why: CUDA-compatible GPUs provide the parallel speed required for thousands of micro-experiments.
"you need an Nvidia chip to actually run autoresearch"
Package a narrow autoresearch loop as a subscription product.
Examples: Amazon-listing optimizer, email-sequence tuner for realtors, SaaS-pricing optimizer. Users pay monthly for an agent that never sleeps.
Why: Turns technical capability into recurring revenue by solving one painful niche extremely well.
"you package tiny auto research loops, tuned for one painful niche"
Sell volume of experiments as a competitive advantage.
Agency pitch: “We run 100× more tests than other shops for the same fee” using autoresearch loops on landing pages, ads, or email sequences.
Why: Traditional agencies run a handful of A/B tests; an agent swarm can run hundreds, uncovering rare winners.
"we do a hundred times more testing than other shops for the same or lower fee"
Point the research loop at money questions.
Generate always-fresh competitor reports, investor due-diligence memos, or regulatory briefs by having the agent read, summarize, and compare sources nightly.
Why: Decision-makers value timely, structured intelligence more than static PDFs.
"research as a service... constantly updated reports on who's doing what, pricing, features, and gaps"
Add an “optimize” button inside existing SaaS products.
Let users press one button to spin a mini autoresearch loop that tunes prompts, pricing, or supplier rankings, then upsell to higher tiers.
Why: Delivers immediate, measurable ROI to users and differentiates the product.
"embed an autoresearch style agent so your users can press optimize"
WHAT'S INSIDE
This is a structured knowledge base — not a prompt file. Your AI retrieves principles semantically, understands the reasoning behind each technique, and connects to related skills via a knowledge graph.
Compatible with OpenClaw · Claude · ChatGPT
principles · semantic retrieval · knowledge graph
Free during beta · Sign in to save to dashboard