~ one-day workshop · economists & social scientists

Agentic AI for Researchers

One-day workshop for economists and social scientists. Balanced demos and guided exercises.

9:00 – 16:10 ~25 participants Demos + exercises

Workshop arc

Intelligence is a commodity — understand the stack, unbundle it, pick your tools.
Agents work only when structure underneath them is good — your Day-1 practices are the prerequisite.
Agents introduce new reproducibility risks — pinned models, saved prompts, and careful review are the cure.

Schedule

Act 1 How do you work today? 80 min

9:00 20 min

Exercise Pair workflow mapping

The research question: “Which country has the highest share of non-programming use of Claude?” Pair up. Each person explains to their partner — on paper — how they would answer this question today: create a folder, find and download data, clean it, run analysis, produce a table. No computers. Just describe your workflow step by step.

pencil & paper pair work current workflow

9:20 40 min

Talk Everything is a chat completion

The technical core without the hype. A model is a next-token predictor. Context is what fits in the window — everything outside it is gone. Attention is how the model decides what matters. Tool calling is structured output with a feedback loop. System prompt, user turn, assistant turn: the anatomy of every interaction. Why context filling causes failure — and why the fix is a fresh session, not more prompting.

context window attention tool calls system prompt

10:00 20 min

Discussion What did your partner forget?

Each partner presents the other person’s workflow to the group. What steps were clear? What was implicit — assumed but never said? This is the same problem an agent faces: it only knows what you tell it. The gap between what you know and what you write down is where agents fail.

implicit knowledge communication gap why structure matters

Act 2 The ecosystem — and working with agents 110 min

10:20 20 min

Break Coffee

10:40 25 min

Talk The ecosystem — and the unbundling argument

Four separable layers: model (GPT, Claude, Gemma, Qwen, DeepSeek), harness (OpenCode, Claude Code, Cursor, Zed), MCP (tools the model can call at runtime), skill (plain-text instructions that live in your repo). Intelligence is commoditizing fast — Chinese models match frontier at 1/25th the cost, rankings change weekly. The opinionated prescription: pick one harness you are comfortable in, swap models freely for each task, write skills as readable markdown, keep very good structure in your work. Lock-in is the risk; model loyalty is a mistake.

model harness MCP skill OpenRouter unbundling

11:05 40 min

Demo The escape–enter loop in practice

Live OpenCode session on a real research task. The two-layer model: natural language in, structured output out. Review diffs, not code. When to hit Escape. The “toxic context” failure mode — when to start fresh. The independence principle: never let the same session verify its own work. Permission boundaries — approve reads and edits inside the project, deny destructive shell commands — so the agent can never delete or overwrite your data. CLAUDE.md: what belongs in it (project conventions, data assumptions, niche tools) and what does not (everything else — irrelevant instructions actively hurt).

OpenCode diff review permissions CLAUDE.md context poisoning independence principle

11:45 25 min

Exercise Write a project config

Draft a CLAUDE.md for the workshop project — what tools do we use (Stata, Python), where does the data live, what naming conventions apply, what should the agent never do? Start with 3–5 rules. Share one thing that was hard to write down. The act of writing conventions in plain text for a model forces a precision that is valuable even if you never use an agent.

hands-on CLAUDE.md pencil & paper

Act 3 Build a project from scratch 130 min

12:10 50 min

Break Lunch

13:00 25 min

Talk Reproducibility from Day 1 — the prerequisite

Quick recap of the Vilhuber/Koren principles as the necessary foundation: no hard-coded paths, no overwriting raw data, README from day 1, secrets in environment variables. These are not just journal requirements — they are what makes your project legible to an agent. An agent working in a project with hard-coded paths, no folder structure, and one giant script will fail even with the best model. Day-1 practices are agent-compatible practices. But they handle only the old reproducibility risk. Agents add a new one: the same prompt does not produce the same code twice — models are non-deterministic, and the model you used is deprecated within a year. The cure is to treat the interaction as data — pin the exact model version, save the prompt alongside the output. The prompt is part of your methods section.

no hard-coded paths folder structure README model pinning prompt as artifact

13:25 20 min

Demo Empty folder to working project

Live demo. Start from an empty folder. Use bead to download the Anthropic Economic Index data. Set up the project structure. Use OpenCode to figure out how to run Stata from the command line. Write the first prompts: load the data, explore it, start answering the research question. Show what happens when the agent writes Python instead of Stata — and how your CLAUDE.md prevents it.

OpenCode bead Stata from the shell first prompts CLAUDE.md in action

13:45 85 min

Exercise Answer the research question

Your turn. Start from an empty folder. Use bead to get the Anthropic Economic Index data. Use OpenCode to write Stata scripts that answer the question: “Which country has the highest share of non-programming use of Claude?” Build the project step by step: folder structure, data loading, cleaning, analysis, output. Review every diff before you approve it. When you get stuck, start a fresh session — do not keep prompting into a broken context. Debrief: each participant shares one thing the agent did that surprised them.

hands-on OpenCode Stata bead Anthropic Economic Index

Instructors circulate to help with setup and troubleshooting.

Closing What comes next 40 min

15:10 20 min

Break Coffee

15:30 40 min

Discussion What do you actually do on Monday?

Structured around five choices: (1) what was the first thing you’ll try with OpenCode on your own project, (2) which model for which task, (3) what goes in your CLAUDE.md — the minimum viable project config, (4) what structural change your project needs before an agent can work in it, (5) which failure modes to watch for — hallucination, silent data decisions, scope creep, context degradation, confident domain errors, restricted data leaving your machine. Each participant writes their five answers. Collected answers become the shared takeaway. Close by restating the arc: Day-1 structure was already right. Project config makes it agent-legible. Review makes it auditable. The models will keep getting cheaper; the structure is the durable investment.

next steps model selection CLAUDE.md structural prerequisite failure modes

16:10

End