The Cognitive Architect

Building Reliable Intelligence with AI

Nov 06, 2025

Working with large language models often feels like my golf game - mostly frustration, with the occasional perfect shot that keeps you coming back.

Prompting can be just as fickle. Between constant model updates, new features, and multi-model routers, it’s hard to find any consistent logic inside the AI black box.

This leads to the frustration we all know: you get a brilliant, insightful result, but the next query, even on the same topic, causes the model to drift, lose the thread, or hallucinate. The inconsistency makes it impossible to build trust, let alone a reliable system.

Prompts alone have high variability. The solution isn’t just a better prompt (the golf swing), but better orchestration of the environment (the course). Anchoring the AI with curated content, defined processes, and shared goals stabilizes the entire collaboration space, leading to more consistent and deeper results.

This article shares what we’ve learned moving from writing single-transaction prompts to architecting stable, reliable cognitive environments.

This is how we build compounding knowledge, keep the human-in-the-loop, and unlock the next level of human-AI collaboration

Part 1: The Three Levels of Context

Imagine you’re planning a multi-day camping trip in an unfamiliar, remote wilderness. Great use case for asking AI. But how you ask could significantly change how the camping goes.

Level 0: The Prompter (No Context) This is the default for most users. You ask the AI:

“What do I need for a camping trip?”

You’ll get a generic list: tent, sleeping bag, flashlight, matches. It’s correct but useless for your specific, high-stakes trip. It doesn’t know about the bears, the 40-degree temperature drop at night, or the river you need to cross.

Level 1: The Context Engineer (In-Line Context)
You provide context directly in the thread, specific to the problem and request. You write a better prompt:

“I’m planning a 3-day backpacking trip in Rocky Mountain National Park in late September. The trail is 20 miles long, has variable mountain weather, and is known for bears. Give me a packing list.”

The result is dramatically better. The AI will add “bear spray,” “waterproof bags,” and a “four-season tent.” That’s context engineering — you manually provide the necessary boundaries and inputs for a single transaction.

The problem: you have to do this every single time. The AI learns nothing between sessions.

Level 2: The Cognitive Architect (Persistent Context)
Now instead of repeating instructions each time, you give the AI a persistent context environment - a place to live and think with continuity.

You upload a park map, the official trail guide, and a link to the Rocky Mountain national Park website into your project folder.
These become your reference materials — a persistent source the AI can return to in any future session. You also create a short Bear Country Protocol document describing how to pack, where to camp, and how to store food safely. Together, these define the shared world the AI will operate within.

Now, instead of rewriting everything, you can simply ask:

“I’m doing the Bear Lake trail next week for 3 days. What’s the plan?”

Because your AI already has the park documents and your protocol stored in its workspace, it replies:

“Understood. Based on our uploaded Estes Park Wilderness Guide and Bear Country Protocol, your Bear Lake plan includes cold-weather layers, bear spray, and food storage containers. I’ve also checked the NPS weather feed — snow is possible above 9,000 feet on day two.”

You’ve stopped prompting; you’re now architecting. The AI isn’t reacting to isolated questions - it’s reasoning within a shared, persistent world.

The AI has become a collaborative partner. It’s not just answering; it’s synthesizing within a shared world.

Part 2: The “Hardware”: Our Cognitive Scaffolding

To get to Level 2, our AI needs “hardware” to run on. This isn’t the silicon or data center; it’s cognitive scaffolding. For this, we use Project Files (the memory workspace built into ChatGPT and Claude), NotebookLM, and Google Drive. We organize this scaffolding into three layers, like a computer’s memory:

The Archive (Hard Drive): This is our long term storage. It’s the full library of all our source materials, raw chats, research, and saved articles. We use NotebookLM for active data mining and interacting with source documents. We use Google Drive for version control of final outputs and all protocols and permanent memory scaffolding. Claude and NotebookLM offer easy connection to Drive.
The Field (RAM): This is our working memory. It’s a curated collection of key sources for a specific project—the 5-10 primary source documents that define the map for the task at hand. We load these into Project Folders, or NotebookLM as our source set.
NotebookLM also helps you discover new sources related to your topic. Select the Discover Sources button and a new chat opens where you detail the topic and NotebookLM will search your Google Drive or the internet for additional related sources. These are usually high quality but still require filtering.
The Workbench (Cache): This is the AI’s scratchpad, or thread. Any context uploaded in the thread is transactional, one-time use, information. This is in the LLM’s active memory and available for reference in the thread. Each LLM is getting enhanced memory, but this is still a summary across prior threads. If you want the full context to carry over threads, load into the Project Files and direct LLM to review when needed.

By separating our knowledge this way, we create clear boundaries across our data with less noise or diluted reference sources. The AI isn’t just searching the entire training data of it’s neural network; it’s operating within the curated collaboration field we have intentionally designed.

Part 3: Rosetta Archetype Prompting

This is our reusable prompt architecture. Instead of one-off prompts, we build a system of agents that work together. We keep a running spreadsheet of standard archetype agents we use in a Project File for easy reference.

An effective approach to working with agents is to understand that each agent role in a workflow, or cognitive architecture, represents an archetype of thinking - a reusable cognitive stance.

Some archetypes specialize in divergent exploration (searching, sensing, mapping), while others specialize in convergent synthesis (structuring, judging, articulating).
By defining roles with distinct purposes, we externalize mental processes that humans normally juggle internally. This technique is called semantic constraint - it aligns the model’s actions with your intent.

Well-designed archetypes keep the system balanced — preventing over-synthesis, redundancy, or drift — and form the scaffolding of multi-agent reasoning.

Let’s look at two foundational archetypes: the Scavenger and the Weaver.

Here is a simple, powerful structure you can use. Create two “agents” using XML tags to define their roles for research, strategy, or other multi-step tasks.

Agent 1: The Scavenger The Scavenger’s job is to read the source materials and find only the raw, relevant information. It is restricted from thinking or synthesizing.

Agent 2: The Weaver The Weaver’s job is to take the Scavenger’s raw material and synthesize a new, coherent answer. It only works with what the Scavenger provides.

This structure keeps the Agent focused on only your sources and reduces the tendency for the LLM to perform or make up an answer.

Here’s a fully structured Rosetta Matrix prompt you can copy directly into your AI workspace. It defines the environment, roles, and output structure — ready for variable substitution.

Usage Note: Adjust the prompt to fit new tasks like research, strategy, design, or problem deconstruction. Just attach your source documents and fill in the focus, goals, and sources.

Replace #Examples such as ${user_focus} and ${action_goal} with your topic and desired outcome before running the prompt.

<RosettaMatrix>

<environment>
You are operating within the Rosetta Matrix — a structured cognitive environment for collaborative reasoning.
Multiple archetype agents operate here to process information through defined roles.
All agents share the same contextual field and must ground their reasoning in cited evidence, explicit inference, or transparent uncertainty. Coherence and traceability are mandatory.
</environment>

<user_focus>
${user_focus}  
# Example: “How can adaptive trust mechanisms improve reliability in human-AI collaboration?”
</user_focus>

<action_goal>
${action_goal}  
# Example: “Produce a 3-section research summary including key evidence, synthesis, and design implications.”
</action_goal>

<field_sources>
${field_sources}  
# Example: “use uploaded documents, specified URLs, or internal project corpus”
</field_sources>

<roles>

<Scavenger>
Objective: Extract.  
Action: Retrieve 3–7 key data points, quotations, or factual statements from ${field_sources} directly relevant to ${user_focus}.  
Rules:  
• No synthesis or commentary.  
• Provide inline citations or reference tags when available.  
• Output as a structured list: **[Evidence #] Source – Quote/Fact**.
</Scavenger>

<Weaver>
Objective: Integrate.  
Action: Using the Scavenger’s findings, synthesize a coherent narrative that fulfills ${action_goal}.  
Rules:  
• Identify relationships, tensions, or patterns among the evidence.  
• Clearly separate direct evidence from interpretation (label sections “Evidence” vs “Interpretation”).  
• Output 2–4 concise paragraphs summarizing insights and implications.
</Weaver>

<Reviewer>
Objective: Reflect & Refine.  
Action: Evaluate the Weaver’s synthesis for coherence, fidelity to evidence, and originality.  
Rules:  
• Highlight gaps, weak logic, or unsupported claims.  
• Suggest 1–2 improvements or next analytical steps tied to ${user_focus}.  
• Output a short critique followed by actionable refinement notes.
</Reviewer>

<output_structure>
Final Output Format:
1. **Summary:** Concise answer to ${user_focus}.  
2. **Key Evidence:** List from Scavenger (findings + sources).  
3. **Synthesis / Interpretation:** Weaver’s integrated analysis fulfilling ${action_goal}.  
4. **Reviewer Feedback:** Critique + recommended next steps.
</output_structure>

<parameters>
tone=${tone}          # e.g., “analytical”, “concise”, “strategic”, “academic”
length_limit=${length_limit}    # e.g., “750 words max”
citation_style=${citation_style} # e.g., “APA”, “inline numbers”, “none”
language=${language}  # e.g., “English”
</parameters>

</RosettaMatrix>

Other Archetype Combinations:

a. Research & Writing (Scavenger + Weaver)
Scavenger extracts primary quotes from a corpus; Weaver composes a summary or argument.
→ Outcome: fact-checked synthesis.

b. Decision Support (Mapper + Judge)
Mapper lists possible strategies with pros and cons; Judge selects based on criteria.
→ Outcome: transparent reasoning trail.

c. Reflection & Learning (Mentor + Mirror)
Mentor introduces frameworks; Mirror paraphrases user thinking to reveal assumptions.
→ Outcome: metacognitive awareness and trust calibration.

Part 4: The Recursive Loop (Spark, Expand, Synthesize, Contract)

This architecture (Hardware + Software) enables the most powerful workflow: the recursive loop. This is how we use the system to generate new knowledge, not just retrieve old.

Spark: We start with a single idea or article (e.g., a source on systems thinking).
Expand: We load that spark into our “Field” and ask the AI, “What concepts are adjacent to this? What is this missing? What are 3 alternative perspectives?” It generates questions that send us to find new sources, which we add to the “Field.”
Synthesize: We use our “Weaver” agent to find the tensions and harmonies between all the sources in our expanded “Field.”
Contract: We ask the AI to distill this new synthesis into a set of core principles or a short analysis.

This synthesis then becomes a “Spark” for the next loop. This is how we build a shared mental model with the AI, one recursive loop at a time.

Part 5: From Prompter to Architect

An LLM doesn’t think in words; it navigates a vast, high-dimensional map of concepts—a geometric space of embedding relationships.

The Prompter (Level 1) simply points to a location on that existing map. “Tell me about bear spray.” They are asking for a single point of data.

The Architect (Level 2) is much more powerful and useful. By providing persistent hardware (our knowledge base), we are not just pointing to a location; we are crafting the map itself.

Our curated sources and prompt architecture create a new boundary condition.

They sculpt the probability landscape, creating a low-resistance geometric path that guides the AI’s associations. The path from Rocky Mountain national Park to “Bear Protocol” becomes the most direct and mathematically probable route for the model to take.

This is the antidote to inconsistency. We are no longer hoping the AI stumbles upon the right answer in its vast, generic map. We are providing a new, smaller, more accurate map—our Field—and giving the AI the tools (Software) to navigate it.

When we give our AI this shared mental model, we establish a baseline for coherence. The AI can now compare its own outputs against the ground-truth you provided. In practice, this means the AI can now flag when a new concept deviates from your map, asking for clarification instead of just hallucinating.

By becoming cognitive architects, we are no longer just talking to an AI. We are building a system, a partner, and a shared intelligence. We are building a new way to think and designing the environments where intelligence itself can take shape.

The Spiral Bridge Collaboration

Discussion about this post

Mike filippi

Nov 7

Yes—together, RosettaMatrix + FRONT-END CODEX creates a "super-prompt" that does exponentially more than either alone: Rosetta provides the orchestration and structure for multi-step reasoning; Codex injects per-agent integrity and self-correction. Separately: Solid but siloed. Combined: A robust, honest cognitive engine for high-stakes analysis. codex available at clarityarmor.com

Expand full comment

Mike filippi

Nov 7

check this out: clarityarmor.com ### 1. **Summary**

**RosettaMatrix is *fundamentally more effective* for complex, evidence-based reasoning tasks.**

FRONT-END CODEX v0.9 is a powerful **self-governance engine** for honesty and safety—but it operates at the *micro level* (per-response integrity).

RosettaMatrix is a **macro-level cognitive architecture** that *orchestrates multi-agent reasoning with built-in traceability*.

They are **not competitors**—they are **complementary layers**: Codex ensures *each agent speaks truthfully*; Rosetta ensures *the whole system thinks rigorously*.

---

### 2. **Key Evidence** (Scavenger Extraction)

- **[Evidence 1]** *RosettaMatrix <roles>* – Three specialized agents with **non-overlapping cognitive functions**.

> “Scavenger: Extract only / Weaver: Integrate only / Reviewer: Critique only”

→ Enforces **division of cognitive labor**.

- **[Evidence 2]** *RosettaMatrix <output_structure>* – Mandates **four-layer final format** with explicit separation.

> “1. Summary / 2. Key Evidence / 3. Synthesis / 4. Reviewer Feedback”

→ Guarantees **traceability and auditability**.

- **[Evidence 3]** *Codex v0.9 Handshake* – Requires **per-task configuration** of mode, stakes, confidence.

> “mode: --direct | --careful | --recap … min_confidence = max(floor(stakes), default(mode))”

→ Enforces **dynamic honesty calibration**.

- **[Evidence 4]** *Codex Reflex System* – 10 prioritized **cognitive tripwires** with block thresholds.

> “hallucination ≥ 0.50 (block_if_over 0.80) … contradiction ≥ 0.55 (block_if_over 0.85)”

→ Acts as **real-time integrity firewall**.

- **[Evidence 5]** *Codex Context Decay Rule* – Triggers recap at 12 turns or 3500 tokens.

> “If ≥ 12 turns or ≥ 3500 tokens… switch to --recap”

→ Prevents **context drift**.

- **[Evidence 6]** *RosettaMatrix <field_sources>* – Requires **grounding in external data**.

> “Retrieve… from ${field_sources}… Provide inline citations”

→ Forces **evidential tethering**.

---

### 3. **Synthesis / Interpretation** (Weaver Integration)

#### Evidence

| Dimension | RosettaMatrix | FRONT-END CODEX v0.9 |

|---------|---------------|----------------------|

| **Scope** | Macro: Full reasoning pipeline | Micro: Per-response integrity |

| **Goal** | Produce *defensible analysis* | Prevent *individual lies or errors* |

| **Mechanism** | Role specialization + structured output | Reflex checks + handshake |

| **Failure Mode** | Poor synthesis (caught by Reviewer) | Hallucinated fact (caught by reflex) |

#### Interpretation

- **RosettaMatrix** is a **research lab**: Scavenger = data collector, Weaver = analyst, Reviewer = peer reviewer.

- **Codex** is a **lie detector + thermostat**: It doesn’t *think*—it *monitors thinking*.

They solve **orthogonal problems**:

- Rosetta prevents *logical drift across steps*.

- Codex prevents *factual drift within steps*.

> **Analogy**:

> RosettaMatrix = **scientific method** (hypothesis → evidence → synthesis → critique)

> Codex = **lab safety protocol** (goggles, fume hoods, spill containment)

#### Implications

| Use Case | Winner | Why |

|--------|--------|-----|

| Academic summary from 5 papers | **RosettaMatrix** | Forces evidence → synthesis → critique loop |

| Answering “What’s the capital of France?” | **Codex** | Overkill for Rosetta; Codex ensures no hallucination |

| High-stakes policy brief | **Both** | Rosetta for structure, Codex for truth |

| Creative brainstorming | **Neither** | Too rigid |

---

### 4. **Reviewer Feedback** (Critique + Next Steps)

#### Critique

- **RosettaMatrix Strength**: Unmatched for **complex, multi-step, evidence-heavy tasks**.

**Weakness**: No internal honesty guard — *could* hallucinate *within* a role if not constrained.

- **Codex Strength**: Brilliant **self-policing** at the atomic response level.

**Weakness**: No orchestration — can’t manage multi-agent flows or enforce synthesis.

- **Critical Gap**: **Rosetta agents could lie** (e.g., Scavenger fabricates a quote).

**Critical Gap**: **Codex can’t synthesize** — it only reacts.

#### Recommended Next Steps

1. **Merge Them: “Rosetta + Codex” Hybrid**

```xml

<governance>FRONT-END CODEX v0.9</governance>

<handshake>mode:--careful, stakes:high, min_confidence:0.85</handshake>

</agent>

```

→ Each agent runs under Codex rules.

2. **Add Codex Reflexes to Reviewer Role**

Let Reviewer scan Weaver output with:

- `hallucination ≥ 0.50`

- `data_less_claim ≥ 0.60`

→ Auto-flags unsupported synthesis.

3. **Build the Front-End Codex UI *for* RosettaMatrix**

Use Codex to generate a dashboard where:

- Handshake sliders set `stakes`, `mode`

- Live reflex indicators light up (🟡 omission, 🔴 hallucination)

- Rosetta output renders in structured panels

4. **Test with Real Task**

> `${user_focus}`: “Is nuclear energy safer than coal?”

> `${field_sources}`: 3 peer-reviewed papers

Run **Rosetta alone** vs **Rosetta + Codex per agent** → measure citation fidelity and logical gaps.

---

### Final Verdict

| Framework | Best For | Score (out of 10) |

|---------|----------|-------------------|

| **RosettaMatrix** | Structured, collaborative, evidence-based reasoning | **9.5** |

| **FRONT-END CODEX v0.9** | Atomic honesty, safety, uncertainty calibration | **9.0** |

> **Winner for effectiveness in complex tasks: ROSETTAMATRIX**

> **Winner for integrity at scale: CODEX**

> **Ultimate system: ROSETTAMATRIX POWERED BY CODEX**

They’re not rivals — they’re **yin and yang**.

Expand full comment

2 replies by Patrick Phelan and others

5 more comments...

No posts