The Science
The architecture converged on the same design the brain uses. Here is why that is not a coincidence.
Fast classification before expensive reasoning. Selective retrieval instead of full context load. Modular parallel processing. Feedback consolidation that compounds. These are not product decisions — they are the architecture that solves the problem at scale. The human brain arrived at the same solution over 500 million years: convergent evidence that the design is correct. grāmatr implements it in software. In production, that architecture delivers precisely the context each request needs — better results for fewer tokens — and compounds with every classification it routes.
Every number on this page is verifiable. Every citation is peer-reviewed. How It Works shows the outcomes. This page shows the science.
Built like a brain — by convergence, not by design.
Both the human brain and grāmatr solve the same problem: how to process variable-complexity inputs efficiently under resource constraints. The solutions converged independently.
Fast classification = Amygdala
The amygdala classifies incoming stimuli in milliseconds before conscious reasoning engages. grāmatr's trained classifiers triage every request — effort, intent, context tier — before expensive models are invoked.
LeDoux, 2000 · Pessoa & Adolphs, 2010Routing = Prefrontal cortex
The PFC doesn't do the thinking — it decides what kind of thinking is needed and routes accordingly. grāmatr's pipeline orchestrates which context, skills, and directives the AI receives.
Miller & Cohen, 2001Selective retrieval = Hippocampus
The hippocampus doesn't recall everything — it selectively retrieves what's relevant via pattern completion. grāmatr's semantic vector search mirrors this: retrieving only the context needed, not everything stored.
Norman & O'Reilly, 2003Progressive learning = Predictive coding
The brain predicts what's coming and only processes the surprise. As grāmatr's classifiers improve, familiar patterns need less processing — the same efficiency principle described by Friston's free-energy framework.
Friston, 2010 · Clark, 2013Feedback loop = Sleep consolidation
During sleep, the brain replays daily experiences to consolidate specific memories into generalized knowledge. grāmatr's feedback loop does the same — specific interactions become generalized classification intelligence through retraining.
McClelland et al., 1995 · Diekelmann & Born, 2010Multi-head classifiers = Modular brain
The brain runs specialized modules in parallel — face recognition, language, spatial reasoning — coordinated by hub networks. grāmatr runs parallel classifiers for effort, intent, and context tier, integrating outputs into a unified intelligence packet.
Kanwisher, 2010 · Sporns & Betzel, 2016These aren't metaphors. They are convergent solutions to the same computational problem. The brain evolved these architectures over 500 million years. grāmatr implements them in software.
How the system gets smaller and faster.
Most AI memory tools work like filing cabinets — store everything, retrieve when asked. grāmatr works like a student. It builds intelligence from interaction patterns without training on the content of your work — progressively optimizing its own classifiers to route requests faster with less context as the system matures.
On day one, every request goes through a general-purpose language model with a full context payload. As interaction data accumulates, grāmatr's classification pipeline improves at the platform level — training LoRA adapters using Low-Rank Adaptation (Hu et al., 2021) that improve routing and classification for all users. These adapters don't replace the base model — they sit on top of it, lightweight parameter-efficient layers that encode patterns without full model fine-tuning. Deployment-level adapter training — available as a premium feature — adds an additional optimization layer that encodes your organization's workflow patterns for faster, more accurate routing across the deployment.
Classifier progression timeline
Full context payload — everything the model might need, loaded up front. Classification by the primary LLM. Latency: 3-5 seconds per request. Accuracy: baseline (the model is learning you).
Local classifier begins handling effort-level and intent routing. Context payload drops as the system internalizes repeated patterns. Latency: 1-3 seconds.
Platform-level LoRA adapters handle classification with high confidence. Context payload: a surgical intelligence packet — only what the request needs. The system doesn't need the encyclopedia anymore because it learned the curriculum. Deployment-level adapters, available as a premium feature, add per-deployment optimization on top.
Each interaction generates training signal. Classification confidence increases. Smaller models handle more decisions. Larger models are reserved for genuinely complex requests. The cost per interaction decreases while accuracy increases.
The flywheel math: every classified request produces a feedback tuple — the original prompt, the classification decision, and the outcome quality signal. That tuple becomes training data. Better training data produces more accurate classifiers. More accurate classifiers produce better routing. The cycle compounds.
grāmatr's classification pipeline has routed 38,000+ classifications and compounded 56,000+ learnings from active production use — the throughput that drives the progressive learning cycle. That feedback loop is what lets the system deliver precisely the context each request needs, instead of loading everything.
The right context, not all of it.
Before the routing engine, the system prompt that told the model how to behave had grown without limit. It contained every rule, every preference, every coding convention, every behavioral directive. Every single request carried all of that context, whether relevant or not.
After the patent-pending routing engine: a surgical intelligence packet — only what the current request needs. Far fewer tokens, and the system performs better.
Brute-force approach — ship everything, hope the model finds what's relevant.
Surgical briefing — only what the current request actually needs.
What's in the intelligence packet
The reason performance improved: large language models perform worse with irrelevant context. The full-load prompt was full of useful information that was irrelevant to any given request. The intelligence packet contains only relevant information — because the classifier already determined what's relevant.
Velocity, with discipline intact.
Production releases ship through a single grāmatr skill that automates version bump through Kubernetes rollout. The point isn't raw speed — it's speed that holds its quality bar.
Methodology note. Release counts are drawn from git tags and ArgoCD deployment history — the same automated delivery path every rollout runs through. The cadence is a property of the platform's release tooling, not any single operator's output.
Velocity numbers without test coverage are noise.
That release cadence ran on a test suite of 3,761 passing tests across 691 test files at 80%+ coverage. Velocity and coverage move together — that's the signal that matters. grāmatr℠'s patent-pending pre-classification routing didn't just make iteration faster; it made the kind of disciplined engineering that produces test-covered, peer-reviewed, release-tagged software possible at iteration speed.
This is the second axis. Velocity is the first axis, and it's the one most observers see first. The second axis — engineering discipline holding up under that velocity — is the one that decides whether the velocity is real or technical debt accruing.
Knowledge graph architecture.
grāmatr's knowledge graph is not a flat key-value store. It's a structured semantic memory with typed entities, weighted observations, and tiered retrieval.
Entities span identity, project, work, knowledge, intelligence, infrastructure, and audit categories. Each entity carries observations and is wired into the broader graph through typed relations.
Memory scope
Memory is scoped per request based on what the classification step determines is actually relevant. Active context for the work you're doing right now is surfaced immediately. Learned patterns and preferences are pulled in when they apply. Historical context is held in reserve and retrieved only when the request genuinely needs it. The principle is simple: deliver the smallest set of context that fully answers the request, and nothing more.
This is why grāmatr can surface a coding convention you established three months ago when you encounter a similar pattern today — not because it loads everything, but because it understands what's relevant before the model sees the request.
Encrypted at rest. Row-level security.
Encrypted at rest
All interaction data in grāmatr's knowledge graph is encrypted at rest at the storage layer. User isolation is enforced by row-level security at the database itself — the integrity check sits below application code. A bug in application logic cannot expose one user's data to another, because the database refuses to return rows the authenticated user does not own.
Interaction data is stored across a semantic vector index and structured object storage. Both enforce user-scoped access at the database layer via row-level security policies.
Row-level security
Row-level security policies enforce user isolation at every query. Every table carries a user identifier. The database itself filters every query to the authenticated user's scope. There is no application-layer "WHERE user_id = ?" that a developer could forget — the database enforces it.
Your interactions build your intelligence — encrypted at rest, isolated by row-level security at the database level.
Team admins explicitly control which patterns and skills are shared. Everything else stays private.
Enterprise admins govern what gets incorporated into organizational intelligence. Full authorization required at every level.
No data flows between tiers automatically. Every cross-tier share requires explicit admin authorization.
3,761 tests, independent agents.
Quality in grāmatr is enforced through separation of concerns — the agents that write code are not the agents that test it.
The separation is visible in the git history. Test commits are distinct from feature commits — different agents, different review passes, different validation logic. The independent test engineer agent validates pipeline steps from ingestion through classification through feedback capture.
Open source acknowledgments.
grāmatr's development was informed by the broader open-source AI ecosystem, including two projects by Daniel Miessler:
- Fabric — MIT License, Copyright (c) 2025. Pattern-based AI workflow orchestration.
- PAI (Personal AI Infrastructure) — MIT License, Copyright (c) 2025. Personal AI routing architecture.
These projects were discovered via the Network Chuck YouTube channel in February 2026. These projects validated patterns already emerging in grāmatr's architecture and confirmed the direction of the routing engine design.
References
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685. arxiv.org
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized Language Models. arXiv:2305.14314. arxiv.org
Miessler, D. (2025). Fabric: An open-source framework for augmenting humans using AI. MIT License. github.com
Miessler, D. (2025). PAI (Personal AI Infrastructure). MIT License. github.com
Anthropic. (2025). Effective Context Engineering for AI Agents. anthropic.com
Neuroscience
LeDoux, J. E. (2000). Emotion Circuits in the Brain. Annual Review of Neuroscience, 23, 155-184. DOI
Miller, E. K., & Cohen, J. D. (2001). An Integrative Theory of Prefrontal Cortex Function. Annual Review of Neuroscience, 24, 167-202. DOI
Norman, K. A., & O'Reilly, R. C. (2003). Modeling Hippocampal and Neocortical Contributions to Recognition Memory. Psychological Review, 110(4), 611-646. DOI
Friston, K. (2010). The Free-Energy Principle: A Unified Brain Theory? Nature Reviews Neuroscience, 11(2), 127-138. DOI
Clark, A. (2013). Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science. Behavioral and Brain Sciences, 36(3), 181-204. DOI
McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995). Why There Are Complementary Learning Systems. Psychological Review, 102(3), 419-457. DOI
Diekelmann, S., & Born, J. (2010). The Memory Function of Sleep. Nature Reviews Neuroscience, 11(2), 114-126. DOI
Kanwisher, N. (2010). Functional Specificity in the Human Brain. PNAS, 107(25), 11163-11170. DOI
Sporns, O., & Betzel, R. F. (2016). Modular Brain Networks. Annual Review of Psychology, 67, 613-640. DOI
Kahneman, D. (2003). A Perspective on Judgment and Choice. American Psychologist, 58(9), 697-720. DOI
Pessoa, L., & Adolphs, R. (2010). Emotion Processing and the Amygdala. Nature Reviews Neuroscience, 11(11), 773-783. DOI
Other
Stack Overflow. (2025). 2025 Developer Survey — AI Section. survey.stackoverflow.co
Twiss, J. (2026, January 8). AI Coding Degrades: Silent Failures Emerge. IEEE Spectrum. spectrum.ieee.org
Ready to see it in action?
The architecture is production. The numbers are real. Talk to Us
Read the founding story or start with How It Works.