Technical Deep-Dive

To get AI to think consistently, I had to teach it how to think like a person.

I didn't set out to build a brain. I built what worked — the system that became grāmatr℠. Then I looked at the neuroscience and realized the architecture I'd converged on — fast classification, selective retrieval, modular processing, feedback consolidation — is the same architecture the human brain uses. Not because I copied it. Because the problem demands it.

Every number on this page is verifiable. Every citation is peer-reviewed. How It Works shows the outcomes. This page shows the science.

Built like a brain — by convergence, not by design.

Both the human brain and grāmatr solve the same problem: how to process variable-complexity inputs efficiently under resource constraints. The solutions converged independently.

Fast classification = Amygdala

The amygdala classifies incoming stimuli in milliseconds before conscious reasoning engages. grāmatr's trained classifiers triage every request — effort, intent, context tier — before expensive models are invoked.

LeDoux, 2000 · Pessoa & Adolphs, 2010

Routing = Prefrontal cortex

The PFC doesn't do the thinking — it decides what kind of thinking is needed and routes accordingly. grāmatr's pipeline orchestrates which context, skills, and directives the AI receives.

Miller & Cohen, 2001

Selective retrieval = Hippocampus

The hippocampus doesn't recall everything — it selectively retrieves what's relevant via pattern completion. grāmatr's semantic vector search mirrors this: retrieving only the context needed, not everything stored.

Norman & O'Reilly, 2003

Progressive learning = Predictive coding

The brain predicts what's coming and only processes the surprise. As grāmatr's classifiers improve, familiar patterns need less processing — the same efficiency principle described by Friston's free-energy framework.

Friston, 2010 · Clark, 2013

Feedback loop = Sleep consolidation

During sleep, the brain replays daily experiences to consolidate specific memories into generalized knowledge. grāmatr's feedback loop does the same — specific interactions become generalized classification intelligence through retraining.

McClelland et al., 1995 · Diekelmann & Born, 2010

Multi-head classifiers = Modular brain

The brain runs specialized modules in parallel — face recognition, language, spatial reasoning — coordinated by hub networks. grāmatr runs parallel classifiers for effort, intent, and context tier, integrating outputs into a unified intelligence packet.

Kanwisher, 2010 · Sporns & Betzel, 2016

These aren't metaphors. They are convergent solutions to the same computational problem. The brain evolved these architectures over 500 million years. grāmatr implements them in software.

How the system gets smaller and faster.

Most AI memory tools work like filing cabinets — store everything, retrieve when asked. grāmatr works like a student. It studies your interactions, builds increasingly accurate models of your intent and preferences, and progressively optimizes its own classifiers to serve you faster with less data.

On day one, every request goes through a general-purpose language model with a full context payload. As interaction data accumulates, grāmatr's classification pipeline improves at the platform level — training LoRA adapters using Low-Rank Adaptation (Hu et al., 2021) that improve routing and classification for all users. These adapters don't replace the base model — they sit on top of it, lightweight parameter-efficient layers that encode patterns without full model fine-tuning. Personal adapter training — a premium feature — adds an additional layer that encodes individual workflow patterns for even faster, more accurate routing.

Classifier progression timeline

Stage 1 General-purpose Days 1-7

Full context payload: ~40,000 tokens. Classification by the primary LLM. Latency: 3-5 seconds per request. Accuracy: baseline (the model is learning you).

Tokens: ~40,000 Latency: 3-5s
Stage 2 Pattern recognition Weeks 2-4

Local classifier begins handling effort-level and intent routing. Context payload drops as the system internalizes repeated patterns. Latency: 1-3 seconds.

Tokens: Decreasing Latency: 1-3s
Stage 3 Platform-optimized Month 2+

Platform-level LoRA adapters handle classification with high confidence. Context payload: ~1,200 tokens — a surgical intelligence packet. The system doesn't need the encyclopedia anymore because it learned the curriculum. Personal adapters, available as a premium feature, add per-user optimization on top.

Tokens: ~1,200 Latency: <2s
Stage 4 Flywheel acceleration Ongoing

Each interaction generates training signal. Classification confidence increases. Smaller models handle more decisions. Larger models are reserved for genuinely complex requests. The cost per interaction decreases while accuracy increases.

Tokens: Optimizing Cost: Decreasing

The flywheel math: every classified request produces a feedback tuple — the original prompt, the classification decision, and the outcome quality signal. That tuple becomes training data. Better training data produces more accurate classifiers. More accurate classifiers produce better routing. The cycle compounds.

grāmatr's classification pipeline has processed over 5,830 routed requests with 1,933 learning corrections from active production use — a single-user dataset that drives the progressive learning cycle. That feedback loop is what collapsed a 40,000-token system prompt to 1,200 tokens.

40,000 → 1,200.

Before the routing engine, the CLAUDE.md file — the system prompt that told Claude how to behave — had grown to over 40,000 tokens (verifiable in the git history). It contained every rule, every preference, every coding convention, every behavioral directive. Every single request carried all of that context, whether relevant or not.

After the patent-pending routing engine: 1,200 tokens. A 97% reduction. And the system performs better.

40,000
tokens before

Brute-force approach — ship everything, hope the model finds what's relevant.

1,200
tokens after

Surgical briefing — only what the current request actually needs.

What's in the 1,200-token intelligence packet

Effort level 1 of 7 tiers, from instant lookup to comprehensive project
Intent type Code, research, analysis, creative, operational — routed before the expensive model sees it
Skill match Which dynamic skill, if any, handles this request type
Memory scope Scoped to relevance — only the knowledge graph entities the request actually needs
Behavioral directives The specific rules and constraints relevant to THIS request
Active project state Where the user left off, what phase they're in, what's been decided

The reason performance improved: large language models perform worse with irrelevant context. The 40,000-token prompt was full of useful information that was irrelevant to any given request. The 1,200-token packet contains only relevant information — because the classifier already determined what's relevant.

33.8M
Tokens saved in production use and growing. ~5,800 tokens per request across 5,830 routed requests, plus system prompt reduction on every session start.

The numbers (contribution graph on GitHub).

The grāmatr routing engine went live the week of March 22, 2026. The difference in output is measurable in any git client.

607
Commits in the breakthrough week. March 24–31, 2026. 1,203 files touched. 354,489 lines added. 222,307 net line change. Across five simultaneous projects, through feature branches, pull requests, and code review — not direct-to-main.
Source: git log, grāmatr repositories. Contribution graph publicly visible. Detailed logs available on request for due diligence.
816
GitHub contributions, week of November 16, 2025. The highest single week in the entire 52-week dataset — the first step-change, immediately following the first commit on grāmatr (November 13, 2025). The second step-change came four months later.
15
Production releases shipped in the breakthrough week. v2.0.39 through v0.2.39 — handled through a single grāmatr skill that automates version bump through Kubernetes rollout.
Source: git tags + ArgoCD deployment history.

Methodology note. The GitHub contribution calendar is a lower bound for the March 2026 numbers because squash-merged feature branches collapse a branch's commits into one calendar contribution. The git log is authoritative; the calendar is the public proof. Both are pointing at the same thing.

Velocity numbers without test coverage are noise.

The breakthrough week ran on a test suite of 3,761 passing tests across 691 test files at 80%+ coverage, alongside 607 commits and 354,489 lines of new code. The two move together — that's the signal that matters. grāmatr℠'s patent-pending pre-classification routing didn't just make iteration faster; it made the kind of disciplined engineering that produces test-covered, peer-reviewed, release-tagged software possible at iteration speed.

This is the second axis of the breakthrough. Velocity is the first axis, and it's the one most observers see first. The second axis — engineering discipline holding up under that velocity — is the one that decides whether the velocity is real or technical debt accruing.

Knowledge graph architecture.

grāmatr's knowledge graph is not a flat key-value store. It's a structured semantic memory with typed entities, weighted observations, and tiered retrieval.

4,469
Entities
30
Entity types
34,571
Observations

Entities span identity, project, work, knowledge, intelligence, infrastructure, and audit categories. Each entity carries observations and is wired into the broader graph through typed relations.

Memory scope

Memory is scoped per request based on what the classification step determines is actually relevant. Active context for the work you're doing right now is surfaced immediately. Learned patterns and preferences are pulled in when they apply. Historical context is held in reserve and retrieved only when the request genuinely needs it. The principle is simple: deliver the smallest set of context that fully answers the request, and nothing more.

This is why grāmatr can surface a coding convention you established three months ago when you encounter a similar pattern today — not because it loads everything, but because it understands what's relevant before the model sees the request.

Per-user encryption. Row-level security.

Per-user encryption

Every piece of data in grāmatr's knowledge graph is encrypted and isolated per user. The architecture enforces this at the database level, not the application level — meaning a bug in application code cannot expose one user's data to another.

User interaction data is stored across semantic vector search and structured object storage. Both enforce user-scoped access at the database level.

Row-level security

Row-level security policies enforce user isolation at every query. Every table carries a user identifier. The database itself filters every query to the authenticated user's scope. There is no application-layer "WHERE user_id = ?" that a developer could forget — the database enforces it.

User tier

Your interactions build your intelligence — encrypted at rest, isolated by row-level security at the database level.

Team tier

Team admins explicitly control which patterns and skills are shared. Everything else stays private.

Enterprise tier

Enterprise admins govern what gets incorporated into organizational intelligence. Full authorization required at every level.

No data flows between tiers automatically. Every cross-tier share requires explicit admin authorization.

3,761 tests, independent agents.

Quality in grāmatr is enforced through separation of concerns — the agents that write code are not the agents that test it.

3,761
Passing tests across 691 test files at 80%+ coverage
147
Dedicated flywheel tests verifying the classification-feedback loop
48
User acceptance criteria across 10 real-world scenarios validated through blind UAT

The separation is visible in the git history. Test commits are distinct from feature commits — different agents, different review passes, different validation logic. The independent test engineer agent validates pipeline steps from ingestion through classification through feedback capture.

Open source acknowledgments.

grāmatr's development was informed by the broader open-source AI ecosystem, including two projects by Daniel Miessler:

  • Fabric — MIT License, Copyright (c) 2025. Pattern-based AI workflow orchestration.
  • PAI (Personal AI Infrastructure) — MIT License, Copyright (c) 2025. Personal AI routing architecture.

These projects were discovered via the Network Chuck YouTube channel in February 2026. These projects validated patterns already emerging in grāmatr's architecture and confirmed the direction of the routing engine design.

References

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685. arxiv.org

Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized Language Models. arXiv:2305.14314. arxiv.org

Miessler, D. (2025). Fabric: An open-source framework for augmenting humans using AI. MIT License. github.com

Miessler, D. (2025). PAI (Personal AI Infrastructure). MIT License. github.com

Anthropic. (2025). Effective Context Engineering for AI Agents. anthropic.com

Neuroscience

LeDoux, J. E. (2000). Emotion Circuits in the Brain. Annual Review of Neuroscience, 23, 155-184. DOI

Miller, E. K., & Cohen, J. D. (2001). An Integrative Theory of Prefrontal Cortex Function. Annual Review of Neuroscience, 24, 167-202. DOI

Norman, K. A., & O'Reilly, R. C. (2003). Modeling Hippocampal and Neocortical Contributions to Recognition Memory. Psychological Review, 110(4), 611-646. DOI

Friston, K. (2010). The Free-Energy Principle: A Unified Brain Theory? Nature Reviews Neuroscience, 11(2), 127-138. DOI

Clark, A. (2013). Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science. Behavioral and Brain Sciences, 36(3), 181-204. DOI

McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995). Why There Are Complementary Learning Systems. Psychological Review, 102(3), 419-457. DOI

Diekelmann, S., & Born, J. (2010). The Memory Function of Sleep. Nature Reviews Neuroscience, 11(2), 114-126. DOI

Kanwisher, N. (2010). Functional Specificity in the Human Brain. PNAS, 107(25), 11163-11170. DOI

Sporns, O., & Betzel, R. F. (2016). Modular Brain Networks. Annual Review of Psychology, 67, 613-640. DOI

Kahneman, D. (2003). A Perspective on Judgment and Choice. American Psychologist, 58(9), 697-720. DOI

Pessoa, L., & Adolphs, R. (2010). Emotion Processing and the Amygdala. Nature Reviews Neuroscience, 11(11), 773-783. DOI

Other

Stack Overflow. (2025). 2025 Developer Survey — AI Section. survey.stackoverflow.co

Twiss, J. (2026, January 8). AI Coding Degrades: Silent Failures Emerge. IEEE Spectrum. spectrum.ieee.org

Ready to see it in action?

The architecture is production. The numbers are real. Request Early Access