Context Window & Tokens Explained: Why Chatbots “Forget” (and How to Fix It)

114. Context Window & Tokens Explained: Why Chatbots “Forget” (and How to Fix It)

By Sapumal Herath • Owner & Blogger, AI Buzz • Last updated: March 13, 2026Difficulty: Beginner

You give an AI assistant a clear instruction… and 10 minutes later it ignores it. Or it “forgets” a decision from earlier in the conversation. Or it confidently contradicts something you already told it.

This isn’t (always) because the model is “bad.” Most of the time, it’s because you hit a basic limit: the context window.

This guide explains tokens and context windows in plain English, why chatbots forget, and the practical patterns that make your results more consistent—without needing to be an engineer.

Note: This article is for educational purposes only. It is not legal, security, or compliance advice. Always follow your organization’s policies when sharing text, screenshots, or documents with AI tools.

🎯 What are tokens? (plain English)

Tokens are the “chunks” of text that AI models process. A token might be:

  • a whole word
  • part of a word
  • punctuation or spaces

Because models operate on tokens, not “words,” token counts affect:

  • cost (for API usage)
  • limits (how much the model can read + write at once)
  • quality (longer inputs can cause the model to miss details)

Quick token cheat sheet (approximate)

  • 1 token ≈ 4 characters (English)
  • 100 tokens ≈ 75 words (English)
  • Non-English text often uses more tokens for the same number of characters.

🧠 What is a context window? (the “working memory”)

The context window is the total amount of information the model can “see” at one time—your current message, plus some or all of the conversation history, plus any documents/tool results included.

Think of it as the model’s working memory for the current task. It is not the model’s training data. And it is not permanent memory.

Important: The context window includes both:

  • input tokens (your messages + history + retrieved text)
  • output tokens (the model’s reply)

So if you provide a huge prompt, you leave less “room” for the model to respond—and less room for earlier context to remain visible.

🧭 At a glance

  • Tokens = how text is counted.
  • Context window = how much the model can use as working memory.
  • Why chatbots “forget” = older or less relevant info gets pushed out, summarized, or ignored.
  • Best fixes = context hygiene: pin requirements, summarize, use RAG for documents, separate tasks, and reset threads when needed.

🧩 The 5 most common reasons chatbots “forget”

When an assistant seems forgetful, it’s usually one (or more) of these:

1) You exceeded the context window (history gets squeezed)

As conversations get longer, the model may not be able to include every previous message. Something has to give.

2) Your instructions are competing (the model picks the wrong one)

If you gave one instruction early (“keep it short”) and later asked for detail (“explain deeply”), the model may drift or average them.

3) The “signal-to-noise ratio” got worse

Long pasted logs, repeated content, or giant transcripts can bury the important detail.

4) You changed tasks mid-thread (topic drift)

If you mix five jobs in one thread—research, writing, editing, planning, and policy—the model’s “working set” becomes messy.

5) The model guessed instead of admitting uncertainty

This is related to hallucinations: when the model can’t clearly see the needed info, it may fill gaps with confident-sounding text unless you force it to say “unclear.”

⚙️ The “context budget” model (simple and practical)

Imagine you have a fixed budget (the context window). You are spending that budget on:

  • Task instructions (what you want)
  • Constraints (style, format, rules)
  • Evidence (facts, docs, excerpts)
  • Conversation history (prior decisions)
  • Output (the answer you want back)

If you spend too much on evidence (copy/paste everything), you lose room for reasoning and output. If you spend too much on output, you lose room for history and evidence.

High-quality prompting is basically budgeting attention.

✅ Practical checklist: Make chatbots forget less (copy/paste)

📌 A) Pin your requirements (keep them “sticky”)

  • Put the most important constraints in a short block near the top: audience, tone, format, do/don’t rules.
  • Repeat only the essentials when the thread gets long: “Reminder: keep it under 700 words; include a checklist; avoid hype.”
  • Ask the model to restate your requirements before drafting: “Confirm the rules in bullets, then write.”

🧼 B) Improve “context hygiene”

  • Don’t paste everything. Paste only the relevant excerpt, then link/label where it came from internally.
  • Remove duplicates, boilerplate, signatures, and irrelevant chat history.
  • Prefer structured inputs: bullet points, tables, and labeled sections.

🧾 C) Summarize and continue (when threads get long)

  • Every ~10–20 turns, ask: “Summarize the key decisions, constraints, and open questions in 10 bullets.”
  • Start a fresh thread with that summary as the “source of truth.”
  • This reduces drift and keeps the working memory clean.

📚 D) Use RAG for long documents (instead of pasting full docs)

  • If you regularly work with policies, manuals, or knowledge bases, use a retrieval workflow (RAG) so the model can pull only the needed sections.
  • Require citations or section references inside your workflow when possible.

🧑‍⚖️ E) Force “Observation vs Inference”

  • Add: “If you can’t find the info in the provided context, say ‘unclear’ and list what’s missing.”
  • This reduces made-up details when the model’s context is incomplete.

🧪 Mini-labs (2 no-code exercises)

Mini-lab 1: The “Pin the spec” pattern

Goal: keep the model consistent across a long thread.

Copy/paste prompt:

  • “Pinned spec (do not forget):
  • Audience: ____
  • Format: ____
  • Must include: ____
  • Must avoid: ____
  • Word count: ____
  • Step 1: Repeat the pinned spec back to me in 5 bullets.
  • Step 2: Produce the output.”

What good looks like: the model restates the rules correctly and stays aligned in the draft.

Mini-lab 2: The “Summarize + restart” reset

Goal: stop drift when a conversation gets long.

Steps:

  1. Ask: “Summarize everything important so far: decisions, constraints, facts, and TODOs. Keep it under 150 tokens.”
  2. Open a new chat and paste that summary as the first message.
  3. Continue the task from the clean summary.

What good looks like: fewer contradictions, better focus, and less “forgetting.”

🚩 Red flags (you need to change your workflow)

  • You paste full documents repeatedly instead of excerpting or retrieving.
  • The assistant starts contradicting earlier constraints (length, tone, format).
  • You’re mixing unrelated tasks in one thread and quality keeps dropping.
  • The assistant stops admitting uncertainty and starts “confident guessing.”

❓ FAQs: Tokens and context windows

Do bigger context windows “solve” forgetting?

They help, but they don’t eliminate the problem. Long context can still hide key details in noise. You still need good structure, summaries, and guardrails.

Is the context window the same as “memory”?

No. Context window = what the model can use right now. “Memory” features (in some products) are separate and should be treated carefully—especially with privacy and governance considerations.

What’s the easiest beginner fix?

Pin the requirements, keep inputs short and structured, and do “summarize + restart” when the thread gets long.

🔗 Keep exploring on AI Buzz

📚 Further reading (official references)

🏁 Conclusion

Chatbots don’t “forget” the way humans forget. They lose access to earlier information when it no longer fits cleanly inside the context window—or when it gets buried under noise.

The fix is practical: pin the spec, keep context clean, summarize and restart, and use retrieval instead of pasting everything. If you do those four things, the same model will suddenly feel much more reliable.

Leave a Reply

Your email address will not be published. Required fields are marked *

Latest Posts…