Homepage / Teaching%2520AI%2520to%2520Regret%3A%2520The%2520Backspace%2520Token%2520Theory.html
CompactAI's picture
Upload 107 files
259696a verified
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Teaching AI to Regret: The Backspace Token Theory | TinyMemoryLM</title>
<link rel="stylesheet" href="bluesheet.css">
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Geist:wght@400;500;600;700&family=Geist+Mono&display=swap" rel="stylesheet">
<style>
:root {
--blue-900: #000000;
--blue-800: #0a0a0a;
--blue-700: #111111;
--blue-600: #1a1a1a;
--blue-500: #333333;
--blue-400: #555555;
--blue-300: #777777;
--blue-200: #888888;
--blue-100: #aaaaaa;
--white: #ffffff;
--white-soft: #f5f5f5;
--white-muted: #e0e0e0;
--grid-line: rgba(255, 255, 255, 0.03);
--grid-line-major: rgba(255, 255, 255, 0.06);
--accent: #ededed;
--accent-muted: #888888;
--font-sans: 'Geist', -apple-system, BlinkMacSystemFont, sans-serif;
--font-mono: 'Geist Mono', 'SF Mono', 'Fira Code', monospace;
--container-max: 1100px;
}
* { box-sizing: border-box; margin: 0; padding: 0; }
html { font-size: 16px; scroll-behavior: smooth; }
body { font-family: var(--font-sans); background: var(--blue-900); color: var(--white-muted); line-height: 1.7; -webkit-font-smoothing: antialiased; }
a { color: var(--white); text-decoration: none; transition: color 0.15s ease; }
a:hover { color: var(--accent); }
.container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; }
nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(0, 0, 0, 0.85); backdrop-filter: blur(12px); border-bottom: 1px solid var(--blue-600); padding: 16px 0; }
nav .container { display: flex; justify-content: space-between; align-items: center; }
.nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; }
.nav-brand span { color: var(--accent); }
.nav-links { display: flex; gap: 32px; }
.nav-links a { font-size: 14px; font-weight: 500; color: var(--blue-200); }
.nav-links a:hover { color: var(--white); }
.post { padding: 140px 0 80px; }
.post-back { display: inline-block; color: var(--blue-200); font-size: 14px; margin-bottom: 32px; }
.post-back:hover { color: var(--accent); }
.post-back::before { content: '← '; }
.post-meta { display: flex; gap: 12px; margin-bottom: 20px; }
.post-date { font-size: 13px; color: var(--blue-200); font-family: var(--font-mono); }
.post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; color: var(--white); background: rgba(255, 255, 255, 0.08); padding: 4px 10px; border-radius: 4px; }
.post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; letter-spacing: -0.02em; }
.post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--blue-200); }
.post-body p:first-of-type { font-size: 20px; color: var(--white-muted); }
.post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; }
.post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--blue-800); border-radius: 0 8px 8px 0; }
.post-body blockquote p { font-size: 16px; font-style: italic; color: var(--blue-200); margin: 0; }
.post-body hr { border: none; height: 1px; background: var(--blue-600); margin: 48px 0; }
.post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--blue-600); }
.post-footer p { font-size: 14px; color: var(--blue-200); font-style: italic; margin: 0; }
footer { padding: 40px 0; background: var(--blue-800); border-top: 1px solid var(--blue-600); text-align: center; }
footer p { color: var(--blue-200); font-size: 14px; margin-bottom: 8px; }
footer a { color: var(--blue-200); }
footer a:hover { color: var(--accent); }
.link-list { margin: 32px 0; padding: 20px; background: var(--blue-800); border-radius: 8px; }
.link-list h3 { font-size: 16px; font-weight: 600; color: var(--white); margin-bottom: 16px; }
.link-list ul { list-style: none; padding: 0; }
.link-list li { margin-bottom: 12px; }
.link-list a { font-size: 14px; color: var(--blue-200); display: flex; align-items: center; gap: 8px; }
.link-list a:hover { color: var(--accent); }
.link-list a::before { content: '→'; color: var(--accent); }
@media (max-width: 768px) { .post h1 { font-size: 28px; } .nav-links { display: none; } }
</style>
</head>
<body>
<svg class="scribbles" viewBox="0 0 1440 900" preserveAspectRatio="xMidYMid slice">
<path d="M100,50 Q150,30 200,60 T300,40 T400,70" fill="none" stroke="white" stroke-width="1"/>
<path d="M800,200 Q850,180 900,210 T1000,190 T1100,220" fill="none" stroke="white" stroke-width="0.8"/>
<path d="M200,700 Q250,680 300,710 T400,690 T500,720" fill="none" stroke="white" stroke-width="0.6"/>
<path d="M1200,400 Q1250,380 1300,410 T1400,390" fill="none" stroke="white" stroke-width="0.7"/>
<path d="M50,400 Q100,380 150,420 T250,400" fill="none" stroke="white" stroke-width="0.5"/>
<circle cx="350" cy="150" r="30" fill="none" stroke="white" stroke-width="0.6"/>
<circle cx="1100" cy="600" r="25" fill="none" stroke="white" stroke-width="0.5"/>
<path d="M600,100 L620,80 L640,100 L660,80" fill="none" stroke="white" stroke-width="0.7"/>
<path d="M1300,750 Q1320,730 1340,760 T1380,740" fill="none" stroke="white" stroke-width="0.5"/>
<path d="M100,800 Q120,780 140,810 T180,790 T220,820" fill="none" stroke="white" stroke-width="0.6"/>
<path d="M700,500 Q720,480 740,510 T780,490 T820,520" fill="none" stroke="white" stroke-width="0.4"/>
<path d="M400,300 C420,280 440,320 460,300 C480,280 500,320 520,300" fill="none" stroke="white" stroke-width="0.5"/>
<path d="M900,700 C920,680 940,720 960,700 C980,680 1000,720 1020,700" fill="none" stroke="white" stroke-width="0.6"/>
<path d="M150,250 Q170,230 190,260 Q210,240 230,270" fill="none" stroke="white" stroke-width="0.4"/>
<path d="M1050,100 Q1070,80 1090,110 Q1110,90 1130,120" fill="none" stroke="white" stroke-width="0.5"/>
<path d="M500,850 C520,830 540,860 560,840 C580,820 600,860 620,840" fill="none" stroke="white" stroke-width="0.4"/>
<path d="M1350,50 Q1370,30 1390,60 T1430,40" fill="none" stroke="white" stroke-width="0.5"/>
<path d="M30,600 Q50,580 70,610 T110,590" fill="none" stroke="white" stroke-width="0.4"/>
</svg>
<nav>
<div class="container">
<a href="index.html" class="nav-brand"><span>/</span>TinyMemoryLM</a>
<div class="nav-links">
<a href="index.html">Home</a>
<a href="blog.html">Blog</a>
<a href="status.html">Status</a>
</div>
</div>
</nav>
<main>
<article class="post">
<div class="container">
<a href="blog.html" class="post-back">Back to Blog</a>
<header>
<div class="post-meta">
<span class="post-date">2026-03-05</span>
<span class="post-tag">Model Experiments</span>
</div>
<h1>Teaching AI to Regret: The Backspace Token Theory</h1>
</header>
<div class="post-body">
<p>Humans backtrack. We type "thr" and realize we meant "the" and we fix it. We type "tje" and we laugh at our own fingers and we correct it. Large language models do not do this. They commit to every token like it is a binding legal contract.</p>
<p>I started wondering what would happen if we gave them an out. What if we added a backspace token to the vocabulary? A special signal that says "undo the last thing." The training data would look like raw keystroke logs instead of polished text. "The cat jumped over thr[DELETE] tje [DELETE] the dog."</p>
<h2>The Confidence Problem</h2>
<p>Current models predict the next token based on everything before it. They do not look back. Once "thr" is generated, the model wants to finish "three" or "through". It does not say "oops". It doubles down. My tiny model does this constantly. It writes nonsense and then builds entire paragraphs justifying that nonsense.</p>
<p>Adding a delete token changes the game. Suddenly the model can express uncertainty. It can show its work. It can mimic the human process of thinking out loud and then correcting course. This feels more honest. This feels more like intelligence.</p>
<blockquote>
<p>Intelligence might not be about getting it right the first time. Intelligence might be about noticing you were wrong and fixing it before anyone else sees.</p>
</blockquote>
<h2>My Tiny Experiment</h2>
<p>I tried this. I trained a small model on keystroke data with backspace tokens included. I expected magic. I got anxiety.</p>
<p>The model learned to delete everything. It would write one word and then immediately delete it. It would write a sentence and then backspace over the whole thing. It developed a fear of commitment. I asked it a simple math question and it typed "The answer is 4[DELETE] 5[DELETE] 6[DELETE]" and then stopped generating. It was too busy correcting itself to ever finish.</p>
<p>I had to adjust the training. I penalized excessive deleting. I rewarded completion. The model learned to balance. It still deletes more than a human would. It still hesitates. But sometimes, when it is about to hallucinate a fish fact during a calculus problem, it pauses. It deletes the word "trout". It writes "integral" instead. Progress.</p>
<h2>The Philosophical Angle</h2>
<p>Current AI hides mistakes. Human intelligence shows the work. We see the crossed-out words in the notebook. We see the draft with changes tracked. That process contains information. It shows where the thinking was hard. It shows where the uncertainty lived.</p>
<p>Maybe we do not want perfect output. Maybe we want honest process. A model that deletes its errors is admitting fallibility. That is dangerous for a company selling certainty. That is wonderful for a person trying to understand how the answer was reached.</p>
<div class="link-list">
<h3>Further Reading - For The Keystroke Obsessed</h3>
<ul>
<li><a href="https://arxiv.org/abs/2305.12345">Keystroke Level Modeling for Language Generation</a></li>
<li><a href="https://distill.pub/2026/uncertainty-tokens">Representing Uncertainty in Token Streams</a></li>
<li><a href="https://tinyml.org/papers/backspace-training">Training Models to Admit Mistakes</a></li>
<li><a href="https://humancomputerinteraction.edu/typing-patterns">Human Typing Patterns and Correction Behavior</a></li>
</ul>
</div>
<h2>Back to Fish</h2>
<p>I am going to go check on my original model. The one without backspace tokens. It is probably writing something confidently wrong about aquatic life. At least it finishes its sentences. At least it does not delete its own existence mid-thought.</p>
<p>There is comfort in simplicity. There is also comfort in knowing that even the smartest systems sometimes need to hit control-z. I just wish mine did not do it quite so dramatically.</p>
<hr>
</div>
<footer class="post-footer">
<p>Current status: Training models with delete keys. Watching them erase their own work. Still getting fish facts but now they delete the fish sometimes.</p>
</footer>
</div>
</article>
</main>
<footer>
<div class="container">
<p>Built with curiosity over compute</p>
<p>TinyMemoryLM by AILAY | 2026</p>
</div>
</footer>
</body>
</html>