AI Fluency Field Guide

The better an AI output looks, the more skeptical you should be.

AI outputs can look polished, complete, and convincing long before they’re actually reliable.

This guide is about staying sharp as outputs get faster, cleaner, and easier to trust too soon, then pressure-testing yourself with a source-linked quiz on the latest in agents.

Anthropic’s AI Fluency project inspired this guide.

Start the guide Take the agent quiz

What Anthropic found

Longer back-and-forth conversations showed about twice as many visible fluency behaviors as one-shot chats.

Why this matters

Our work often looks credible before the assumptions, evidence trail, and edge cases have been checked.

Workshop rule

Treat the first answer as a draft. Treat the polished answer as a reason to slow down.

The Three Takeaways

Three habits matter most.

Keep the conversation going

The best work usually comes from a few rounds of refinement, not one clever prompt.

Get more skeptical as outputs improve

The cleaner the memo, chart, or tool looks, the harder you should press on evidence and reasoning.

Tell the model how to work with you

Set the role, ask for pushback, and make uncertainty visible early.

The 11 Behaviors

The 11 behaviors Anthropic could actually see.

Anthropic could directly observe 11 of the 24 fluency behaviors in Claude conversations. That makes them a practical checklist for how people work with AI outputs in the real world. The percentages show how often each behavior appeared in the conversations Anthropic studied.

Field Notes

What this looks like in practice.

Habit 1

Keep the conversation going

If the task matters, the first answer is usually just the start.

Scenario

Board memo first draft

A teammate asks AI for a briefing memo and gets something clean, confident, and almost useful in 20 seconds.

Risk: it sounds sharp but misses the actual decision the memo needs to support.

Fluent follow-up

Ask what decision the memo is supposed to help make.
Have AI separate what is known, inferred, and still missing.
Ask it to identify blind spots before producing the final draft.

Habit 2

Get more skeptical as outputs improve

Polished work is often where teams get least skeptical.

Artifact trap

Research brief with clean citations

AI produces a sleek summary with tidy sourcing, persuasive framing, and just enough confidence to make you skip the hard checks.

What evidence supports each claim? What is inferred vs confirmed? What would break under review?

Fluent move

Interrogate before you circulate

Ask the model to mark unsupported claims, list what it assumed, and flag the riskiest lines for review.

Habit 3

Tell the model how to work with you

Don’t let the model decide for you whether it should be agreeable, brief, or skeptical.

Weak setup

“Write a polished memo from these notes.”

Fluent setup

“Act like a skeptical reviewer. Separate evidence from inference, flag anything unsupported, and ask for missing context before you draft.”

Agent Patterns

The agent conversation has moved from prompt tricks to operating systems.

Emerging agent patterns we're seeing across the latest tools, papers, and operator notes.

Pattern 01

The harness is becoming the product.

Teams are starting to win on workflow design, infrastructure, and repeatability, not just on whichever model looked smartest in a demo.

See open-agents.dev and Anthropic Managed Agents.

Pattern 02

Memory is still mostly an engineering problem.

A lot of “memory” is still retrieval, summarization, and storage discipline wearing nicer clothes.

See Why Long-Term Memory for LLMs Remains Unsolved and Agent Memory Stack.

Pattern 03

Evals are the real control surface.

The serious teams are turning judgment into tasks, graders, and failure cases rather than leaving quality as a vibe.

See Anthropic on agent evals and Evals Are the New PRD.

Pattern 04

Permissions define the blast radius.

Subagents, web browsing, and vibe-coded tools all look different once you treat the environment as adversarial.

See Simon Willison on subagents, 20 Security Mistakes, and AI Agent Traps.

Pattern 05

Orchestration is real, but supervision is still the edge.

Production teams are using subagents, terminal-to-terminal coordination, and long-running infrastructure, but the best patterns are still controlled, observable, and reviewable.

See Measuring Agents in Production, Codex Subagents, and smux.

Field Test

A 20-question quiz on the latest in agents.

This one is intentionally a little sharper. Every answer reveals the correct choice, the explanation, and the source that inspired the question.

20 questions

Short, fast, and a little unforgiving.

Five themes

Production, harnesses, memory, evals, orchestration, and security.

Source-linked

Every answer points back to the article, paper, repo, or thread behind it.

Question 1 of 20

Loading question...

Pocket Checklist

A simple operating rhythm for the team.

Before you ask

Name the task, audience, and what a good output should do.
Tell the model how you want it to behave.
Invite pushback and missing-context questions.

While you work

Iterate instead of restarting from scratch.
Ask what the model is assuming.
Pressure-test the highest-risk paragraphs first.

Before you send

Check evidence, boundaries, dates, and material omissions.
Separate “well-written” from “well-supported.”
Make a human own the final judgment call.

Run It Live

Made to be fast, stable, and easy to share.

This site is intentionally static and lightweight, so it stays fast even with a room full of people using it at once. No logins. No backend. No friction.

Source framing

The guide starts with Anthropic’s fluency framing, but the quiz is built from a live bookmark feed on agents: papers, repos, product launches, practitioner essays, and security notes that keep showing up in the same conversation.