The Science

The architecture converged on the same design the brain uses. Here is why that is not a coincidence.

Fast classification before expensive reasoning. Selective retrieval instead of full context load. Modular parallel processing. Feedback consolidation that compounds. These are not product decisions — they are the architecture that solves the problem at scale. The human brain arrived at the same solution over 500 million years: convergent evidence that the design is correct. grāmatr implements it in software. In production, that architecture delivers precisely the context each request needs — better results for fewer tokens — and compounds with every classification it routes.

Every number on this page is verifiable. Every citation is peer-reviewed. How It Works shows the outcomes. This page shows the science.

Built like a brain — by convergence, not by design.

Both the human brain and grāmatr solve the same problem: how to process variable-complexity inputs efficiently under resource constraints. The solutions converged independently.

Fast classification = Amygdala

The amygdala classifies incoming stimuli in milliseconds before conscious reasoning engages. grāmatr's trained classifiers triage every request — effort, intent, context tier — before expensive models are invoked.

LeDoux, 2000 · Pessoa & Adolphs, 2010

Routing = Prefrontal cortex

The PFC doesn't do the thinking — it decides what kind of thinking is needed and routes accordingly. grāmatr's pipeline orchestrates which context, skills, and directives the AI receives.

Miller & Cohen, 2001

Selective retrieval = Hippocampus

The hippocampus doesn't recall everything — it selectively retrieves what's relevant via pattern completion. grāmatr's semantic vector search mirrors this: retrieving only the context needed, not everything stored.

Norman & O'Reilly, 2003

Progressive learning = Predictive coding

The brain predicts what's coming and only processes the surprise. As grāmatr's classifiers improve, familiar patterns need less processing — the same efficiency principle described by Friston's free-energy framework.

Friston, 2010 · Clark, 2013

Feedback loop = Sleep consolidation

During sleep, the brain replays daily experiences to consolidate specific memories into generalized knowledge. grāmatr's feedback loop does the same — specific interactions become generalized classification intelligence through retraining.

McClelland et al., 1995 · Diekelmann & Born, 2010

Multi-head classifiers = Modular brain

The brain runs specialized modules in parallel — face recognition, language, spatial reasoning — coordinated by hub networks. grāmatr runs parallel classifiers for effort, intent, and context tier, integrating outputs into a unified intelligence packet.

Kanwisher, 2010 · Sporns & Betzel, 2016

These aren't metaphors. They are convergent solutions to the same computational problem. The brain evolved these architectures over 500 million years. grāmatr implements them in software.

How the system gets smaller and faster.

Most AI memory tools work like filing cabinets — store everything, retrieve when asked. grāmatr works like a student. It builds intelligence from interaction patterns without training on the content of your work — progressively optimizing its own classifiers to route requests faster with less context as the system matures.

On day one, every request goes through a general-purpose language model with a full context payload. As interaction data accumulates, grāmatr's classification pipeline improves at the platform level — training LoRA adapters using Low-Rank Adaptation (Hu et al., 2021) that improve routing and classification for all users. These adapters don't replace the base model — they sit on top of it, lightweight parameter-efficient layers that encode patterns without full model fine-tuning. Deployment-level adapter training — available as a premium feature — adds an additional optimization layer that encodes your organization's workflow patterns for faster, more accurate routing across the deployment.

Classifier progression timeline

Stage 1 General-purpose Days 1-7

Full context payload — everything the model might need, loaded up front. Classification by the primary LLM. Latency: 3-5 seconds per request. Accuracy: baseline (the model is learning you).

Tokens: Full loadLatency: 3-5s
Stage 2 Pattern recognition Weeks 2-4

Local classifier begins handling effort-level and intent routing. Context payload drops as the system internalizes repeated patterns. Latency: 1-3 seconds.

Tokens: DecreasingLatency: 1-3s
Stage 3 Platform-optimized Month 2+

Platform-level LoRA adapters handle classification with high confidence. Context payload: a surgical intelligence packet — only what the request needs. The system doesn't need the encyclopedia anymore because it learned the curriculum. Deployment-level adapters, available as a premium feature, add per-deployment optimization on top.

Tokens: SurgicalLatency: <2s
Stage 4 Flywheel acceleration Ongoing

Each interaction generates training signal. Classification confidence increases. Smaller models handle more decisions. Larger models are reserved for genuinely complex requests. The cost per interaction decreases while accuracy increases.

Tokens: OptimizingCost: Decreasing

The flywheel math: every classified request produces a feedback tuple — the original prompt, the classification decision, and the outcome quality signal. That tuple becomes training data. Better training data produces more accurate classifiers. More accurate classifiers produce better routing. The cycle compounds.

grāmatr's classification pipeline has routed 38,000+ classifications and compounded 56,000+ learnings from active production use — the throughput that drives the progressive learning cycle. That feedback loop is what lets the system deliver precisely the context each request needs, instead of loading everything.

The right context, not all of it.

Before the routing engine, the system prompt that told the model how to behave had grown without limit. It contained every rule, every preference, every coding convention, every behavioral directive. Every single request carried all of that context, whether relevant or not.

After the patent-pending routing engine: a surgical intelligence packet — only what the current request needs. Far fewer tokens, and the system performs better.

All of it
loaded every request

Brute-force approach — ship everything, hope the model finds what's relevant.

Just enough
delivered per request

Surgical briefing — only what the current request actually needs.

What's in the intelligence packet

Effort level 1 of 7 tiers, from instant lookup to comprehensive project
Intent type Code, research, analysis, creative, operational — routed before the expensive model sees it
Skill match Which dynamic skill, if any, handles this request type
Memory scope Scoped to relevance — only the knowledge graph entities the request actually needs
Behavioral directives The specific rules and constraints relevant to THIS request
Active project state Where the user left off, what phase they're in, what's been decided

The reason performance improved: large language models perform worse with irrelevant context. The full-load prompt was full of useful information that was irrelevant to any given request. The intelligence packet contains only relevant information — because the classifier already determined what's relevant.

150K vs 1M+
A briefing prepared for a prospective enterprise client, delivered on Sonnet, powered by grāmatr, in roughly 150,000 tokens. The same work run naively on a frontier model would burn well over a million — and still would not produce the same deliverable. Smaller model, fewer tokens, better outcome.

Velocity, with discipline intact.

Production releases ship through a single grāmatr skill that automates version bump through Kubernetes rollout. The point isn't raw speed — it's speed that holds its quality bar.

15
Production releases shipped through automated delivery. v0.2.25 through v0.2.39 — version bump through Kubernetes rollout, handled by a single grāmatr skill.
Source: git tags + ArgoCD deployment history.

Methodology note. Release counts are drawn from git tags and ArgoCD deployment history — the same automated delivery path every rollout runs through. The cadence is a property of the platform's release tooling, not any single operator's output.

Velocity numbers without test coverage are noise.

That release cadence ran on a test suite of 3,761 passing tests across 691 test files at 80%+ coverage. Velocity and coverage move together — that's the signal that matters. grāmatr℠'s patent-pending pre-classification routing didn't just make iteration faster; it made the kind of disciplined engineering that produces test-covered, peer-reviewed, release-tagged software possible at iteration speed.

This is the second axis. Velocity is the first axis, and it's the one most observers see first. The second axis — engineering discipline holding up under that velocity — is the one that decides whether the velocity is real or technical debt accruing.

Knowledge graph architecture.

grāmatr's knowledge graph is not a flat key-value store. It's a structured semantic memory with typed entities, weighted observations, and tiered retrieval.

128,000+
Entities
55
Entity types
34,571
Observations

Entities span identity, project, work, knowledge, intelligence, infrastructure, and audit categories. Each entity carries observations and is wired into the broader graph through typed relations.

Memory scope

Memory is scoped per request based on what the classification step determines is actually relevant. Active context for the work you're doing right now is surfaced immediately. Learned patterns and preferences are pulled in when they apply. Historical context is held in reserve and retrieved only when the request genuinely needs it. The principle is simple: deliver the smallest set of context that fully answers the request, and nothing more.

This is why grāmatr can surface a coding convention you established three months ago when you encounter a similar pattern today — not because it loads everything, but because it understands what's relevant before the model sees the request.

Encrypted at rest. Row-level security.

Encrypted at rest

All interaction data in grāmatr's knowledge graph is encrypted at rest at the storage layer. User isolation is enforced by row-level security at the database itself — the integrity check sits below application code. A bug in application logic cannot expose one user's data to another, because the database refuses to return rows the authenticated user does not own.

Interaction data is stored across a semantic vector index and structured object storage. Both enforce user-scoped access at the database layer via row-level security policies.

Row-level security

Row-level security policies enforce user isolation at every query. Every table carries a user identifier. The database itself filters every query to the authenticated user's scope. There is no application-layer "WHERE user_id = ?" that a developer could forget — the database enforces it.

User tier

Your interactions build your intelligence — encrypted at rest, isolated by row-level security at the database level.

Team tier

Team admins explicitly control which patterns and skills are shared. Everything else stays private.

Enterprise tier

Enterprise admins govern what gets incorporated into organizational intelligence. Full authorization required at every level.

No data flows between tiers automatically. Every cross-tier share requires explicit admin authorization.

3,761 tests, independent agents.

Quality in grāmatr is enforced through separation of concerns — the agents that write code are not the agents that test it.

3,761
Passing tests across 691 test files at 80%+ coverage
147
Dedicated flywheel tests verifying the classification-feedback loop
48
User acceptance criteria across 10 real-world scenarios validated through blind UAT

The separation is visible in the git history. Test commits are distinct from feature commits — different agents, different review passes, different validation logic. The independent test engineer agent validates pipeline steps from ingestion through classification through feedback capture.

Open source acknowledgments.

grāmatr's development was informed by the broader open-source AI ecosystem, including two projects by Daniel Miessler:

These projects were discovered via the Network Chuck YouTube channel in February 2026. These projects validated patterns already emerging in grāmatr's architecture and confirmed the direction of the routing engine design.

References

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685. arxiv.org

Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized Language Models. arXiv:2305.14314. arxiv.org

Miessler, D. (2025). Fabric: An open-source framework for augmenting humans using AI. MIT License. github.com

Miessler, D. (2025). PAI (Personal AI Infrastructure). MIT License. github.com

Anthropic. (2025). Effective Context Engineering for AI Agents. anthropic.com

Neuroscience

LeDoux, J. E. (2000). Emotion Circuits in the Brain. Annual Review of Neuroscience, 23, 155-184. DOI

Miller, E. K., & Cohen, J. D. (2001). An Integrative Theory of Prefrontal Cortex Function. Annual Review of Neuroscience, 24, 167-202. DOI

Norman, K. A., & O'Reilly, R. C. (2003). Modeling Hippocampal and Neocortical Contributions to Recognition Memory. Psychological Review, 110(4), 611-646. DOI

Friston, K. (2010). The Free-Energy Principle: A Unified Brain Theory? Nature Reviews Neuroscience, 11(2), 127-138. DOI

Clark, A. (2013). Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science. Behavioral and Brain Sciences, 36(3), 181-204. DOI

McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995). Why There Are Complementary Learning Systems. Psychological Review, 102(3), 419-457. DOI

Diekelmann, S., & Born, J. (2010). The Memory Function of Sleep. Nature Reviews Neuroscience, 11(2), 114-126. DOI

Kanwisher, N. (2010). Functional Specificity in the Human Brain. PNAS, 107(25), 11163-11170. DOI

Sporns, O., & Betzel, R. F. (2016). Modular Brain Networks. Annual Review of Psychology, 67, 613-640. DOI

Kahneman, D. (2003). A Perspective on Judgment and Choice. American Psychologist, 58(9), 697-720. DOI

Pessoa, L., & Adolphs, R. (2010). Emotion Processing and the Amygdala. Nature Reviews Neuroscience, 11(11), 773-783. DOI

Other

Stack Overflow. (2025). 2025 Developer Survey — AI Section. survey.stackoverflow.co

Twiss, J. (2026, January 8). AI Coding Degrades: Silent Failures Emerge. IEEE Spectrum. spectrum.ieee.org

Ready to see it in action?

The architecture is production. The numbers are real. Talk to Us