TL;DR
AI coding agents turn prompts into diffs.
But today, developers usually review only the final diff.
That is not enough.
Prompt-to-diff history connects the developer’s instruction to the actual code changes that followed. It helps answer the question every developer eventually asks after an AI agent touches a repo:
Git shows the code history.
Prompt-to-diff history shows the agent work behind the code.
The new review problem: a diff without a story
A normal code review starts with a human intention.
Someone opens a pull request. They write a title. Maybe they add a description. Maybe they link an issue. The reviewer looks at the diff and understands the rough path:
AI coding agents make that path messier.
A developer might start with:
Then the agent:
Reads the test
Searches the repo
Finds a token cache
Changes retry behavior
Runs tests
Edits a helper
Updates a config
Changes the test
Produces a final diff
Now the reviewer sees:
But the diff alone does not answer:
Which prompt caused each change?
Which part was intentional?
Which part was collateral damage?
Which change fixed the issue?
Which change was just the agent cleaning up after itself?
Which step introduced the risky behavior?
Can we roll back only the bad part?
That is the gap prompt-to-diff history is meant to close.
What is prompt-to-diff history?
Prompt-to-diff history is the connection between an instruction given to an AI coding agent and the code changes that resulted from it.
In simple terms:
A good prompt-to-diff system lets you inspect that chain.
Not as a giant chat transcript.
Not as a vague summary.
As structured engineering history.
For example:
That is the difference between “AI changed some files” and “I can understand the work.”
Why Git blame is not enough anymore
Git already has blame.
The official Git documentation describes git blame as annotating each line in a file with information from the revision that last modified it.
That works when the question is:
But AI coding changes create a different question:
A commit may contain several agent sessions.
One agent session may contain several prompts.
One prompt may touch several files.
One final diff may include exploration, cleanup, test fixes, and unrelated changes.
So git blame can point you to a commit, but it does not necessarily point you to the decision that created the code.
In the agent era, blame needs another layer.
That is the core idea.
The prompt becomes part of the engineering record
Before AI agents, prompts were not part of software history.
There was no prompt.
There was an issue, a design doc, a commit message, a PR, a Slack thread, or a human memory.
Now the prompt is often the true starting point of the work.
That makes it an engineering artifact.
A prompt can define:
The task
The scope
The constraints
The files the agent should avoid
Whether the agent may edit code
Whether tests should be changed
What “done” means
What tradeoffs are acceptable
Example:
That prompt creates one kind of work.
This prompt creates another:
Both may lead to code changes.
Only one is likely to produce a reviewable diff.
If the prompt shaped the work, it should not disappear after the agent finishes.
Why prompt-to-diff matters for code review
AI-generated diffs can look deceptively clean.
A reviewer sees a file change and thinks:
But they may not know that the agent originally tried three different approaches, hit a failing test, edited a helper, and then changed the test expectation to make the suite pass.
Prompt-to-diff history lets reviewers ask better questions:
Was this file change directly requested?
Did the agent edit tests after implementation failed?
Did the agent broaden the scope?
Did the agent touch a security-sensitive file?
Did the agent run the right tests?
Did the prompt permit this kind of change?
Did the final diff match the original intent?
That is especially important when the prompt was narrow but the diff is broad.
For example:
That should raise eyebrows immediately.
Why prompt-to-diff matters for debugging
Prompt-to-diff history is not only for review.
It is also useful when something breaks later.
Imagine a production bug appears three days after an AI-assisted change.
Git can tell you which commit touched the file.
But you still need to understand why.
Prompt-to-diff history can show:
Now you are debugging with context.
Without that history, you are just staring at a diff and trying to reconstruct intent from the ashes.
AI code provenance is becoming a real problem
“Provenance” sounds academic, but developers already feel the pain.
AI coding agents are now producing real pull requests across real repositories. A 2026 paper introducing the AIDev dataset described 932,791 agent-authored pull requests across OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. Another 2026 study examined how AI coding agents modify code across GitHub pull requests, including additions, deletions, commits, files touched, and how PR descriptions align with diffs.
That means this is no longer just a local developer convenience.
Teams will need to know:
Not legally, not philosophically, but operationally.
Who or what produced it?
Which prompt?
Which session?
Which tool?
Which assumptions?
Which verification step?
That is code provenance for the agent era.
Chat history is not prompt-to-diff history
A common objection is:
Maybe. Sometimes.
But chat history has problems.
It is often:
Too long
Too linear
Too noisy
Detached from exact file changes
Lost after compaction
Split across sessions
Hard to search
Hard to map to specific lines
Weak as a review artifact
Prompt-to-diff history should not be a prettier transcript.
It should be a structured map:
That is more useful than scrolling through a chat and guessing.
Commit messages are not enough either
Another objection:
Better commit messages help.
There is even current research exploring richer commit messages for AI coding agents. The 2026 paper “Lore” argues that each commit captures a code diff but often discards the reasoning, constraints, rejected alternatives, and future context behind the decision. It calls this missing reasoning the “Decision Shadow.”
That is exactly the problem.
But commit messages are not always enough because agent work can happen before the commit.
The useful history often lives inside the session:
Prompt
Agent plan
Files read
Commands run
Failed attempts
Intermediate changes
Recovery points
Final diff
A great commit message can summarize the final decision.
Prompt-to-diff history captures how the agent got there.
What prompt-to-diff history should include
A serious prompt-to-diff layer should capture at least six things.
1. The prompt
The original instruction should be preserved exactly.
Not paraphrased.
Not reconstructed later.
Exactly what the developer asked.
2. The session
The system should know which agent session produced the work.
This matters when multiple sessions touch the same repo.
3. The files touched
Every changed file should be connected back to the prompt or step that changed it.
4. The diff
The final diff matters, but so do intermediate diffs.
Sometimes the agent introduced the bug, fixed it, then reintroduced it.
5. The verification
Which tests or commands were run?
Did they pass?
Did the agent skip verification?
6. The rollback point
If a prompt produced bad work, the developer should be able to roll back that prompt’s changes without manually untangling everything.
What this looks like in practice
A developer should be able to run something like:
And see agent sessions as engineering history.
Then:
And ask:
Then:
And recover from a bad path.
That is the workflow re_gent is building toward.
re_gent is open-source version control for AI coding agent activity. It tracks what your agent did, which prompt wrote each line, and gives developers a way to inspect agent work locally: https://github.com/regent-vcs/re_gent
Prompt-to-diff is the review layer agents are missing
AI coding agents are powerful because they compress work.
One prompt can become a broad diff.
But compressed work is harder to review.
Prompt-to-diff history decompresses it.
It gives developers the missing context:
That is how AI-generated code becomes inspectable.
Not trusted blindly.
Inspectable.
FAQ
What is prompt-to-diff history?
Prompt-to-diff history connects a developer’s prompt to the actual code changes produced by an AI coding agent. It helps developers trace a diff back to the instruction, session, and agent steps that created it.
Why is prompt-to-diff history useful?
It helps with code review, debugging, rollback, and accountability. Developers can see which prompt caused a change instead of only seeing the final diff.
How is prompt-to-diff different from Git blame?
Git blame shows which commit last changed a line. Prompt-to-diff history shows which prompt, agent session, or agent step produced the change.
Is prompt-to-diff the same as chat history?
No. Chat history is a transcript. Prompt-to-diff history is structured engineering history that maps prompts to files, diffs, verification steps, and rollback points.
What is AI code provenance?
AI code provenance is the ability to understand where AI-generated code came from: which agent created it, which prompt caused it, which session produced it, and what verification happened before it was accepted.
Can Git track prompt-to-diff history?
Git can store commits and diffs, but it does not automatically know which prompt or agent step created a change. A separate agent-aware layer can connect Git history with prompt/session history.
What is agent blame?
Agent blame is like Git blame for AI coding work. It helps identify which prompt or agent session produced a specific line or file change.
Does re_gent replace Git?
No. re_gent is designed as a local-first layer around Git. Git tracks files. re_gent tracks the agent activity that produced those file changes.
Final thought
AI coding agents changed the unit of work.
A prompt can now become a diff.
That means the prompt can no longer disappear when the diff appears.
The next layer of developer tooling needs to remember the path:
Because the future of code review is not just asking:
It is asking:


