Building a Hypergraph for AI-Generated Code

Jeremy Tregunna

03 Dec 2025 — 4 min read

When you're thinking about code generation at scale, the question that keeps coming back up is, "where does it live?"

If an AI coding agent generates a function, you write it to a file. Next function, different file. You organize with directories. You follow naming conventions. You manage versions. All the scaffolding that made sense when it was purely humans who were reading and maintaining the code. But that falls apart fast when the primary consumer is an LLM doing the next synthesis pass.

An LLM doesn't think in modules. It can't query efficiently by convention across disparate codebases. "Give me all authentication functions" – it has to know they follow some naming pattern, that they're probably in an auth/ directory. It has to guess or ask for help. It can't just ask semantically: "give me any function that takes a User and returns a Bool."

The Simple Idea

What if code lived in a structure built for semantic queries?

A hypergraph.

Nodes are terms. Edges are dependencies. You don't ask "where is this code?" Instead, you ask "what code has type User -> Bool?" and get an instant answer. You don't maintain versions, you just hash the content. Same code, same hash. Deduplication is automatic, no tooling needed.

This isn't new, conceptually. Git uses content addressing, so does IPFS. Even package managers use similar ideas internally. But most programming languages and systems still treat the filesystem as primary, which makes sense for human navigation and terrible for machine queries.

The implication is: If you're building for AI-to-AI code transfer, everything inverts.

Why SKI combinators and Type-Driven Synthesis

There are practical constraints that make some choices obvious.

An LLM has a finite context window. 256k tokens today, who knows what tomorrow, but still finite. If it's generating code, every token spent on syntax is wasted. (x) => (y) => x in Javascript. lambda x: lambda y: x in Python. |x| |y| x in Rust. All expressing the same function, all burning tokens on ceremony.

SKI combinators say it in three tokens: S K I. Those 3 combinators are also turing-complete, minimal, dense, and most importantly, reduction is pure, no side effects, no I/O to track, no mutable state, just structural transformation. That means you can compute reduce(term) once, hash the result, and cache it forever. The whole hypergraph becomes a memoization table.

So the synthesis problem becomes: Given a goal type, find or compose components to satisfy it. But you can't explore blindly; the search space explodes. The key insight seems to be that you can use unification to prune the hypergraph. If you want to compose function f : a -> b with function g : c -> d, you check whether b unifies with c. If they don't, you skip that branch. This cuts the search space from exponential down to tractable. Type-checking isn't overhead, it's a search optimization.

What Exists

I've been building on this idea recently, and I don't know about you, but I often times need to see something do something to understand it better; so build it adequately to see it do a thing.

The system I've built has:

Type inference. A Hindley-Milner style type-checker that handles polymorphic types. Lets you ask the system "what is the type of this term?" and get back something like ∀a. a → a for the identity function.
Unification. Robinson's algorithm is how polymorphic queries work. You ask "give me all functions of type a -> a" and the system unifies that query against all stored types, returning both exact matches and polymorphic implementations that unify with it.
A query engine that searches the hypergraph by type. It can do exact matching, inferred types, predicate-based filtering.
A synthesis engine that iteratively deepens such that at depth 1 (direct matches), it executes, but if there are no matches, it'll search at depth 2 (composing two components), then depth 3, and so on, until it finds something or hits the limit. But uses unification to prune: Only tries compositions where the types actually line up.

The Missing Pieces

Refinement types are the obvious next step. Right now you can ask for "Int -> Int" and the system finds it. But some functions only work with non-zero integers. Some only work with non-empty lists. Without refinements, you can't express these constraints. Queries become less useful because you can't rule out functions that would fail at runtime.

After that, I need a way to propagate the constraints. Right now synthesis tries combinations and prunes based on unification. With constraints pushed through the search, you can prune much earlier... before even attempting composition.

Then distributed storage, because multiple machines querying and synthesizing against the same hypergraph makes sense in an AI-to-AI world where multiple coding agents are working on the same codebase, saved in the hypergraph, in parallel; they might be on separate machines.

But these are all things that can be built on the foundation that exists so far.

Why this Changes Things

Right now code generation is asymmetric. The model generates, humans verify. Humans decide correctness, humans integrate.

With this system, that asymmetry collapses. The model queries the hypegraph, finds components matching the type it needs, composes them, the type system catches mistakes instantly, so composition is mechanically sound, store the result. That result is immediately queryable, no compilation step, no testing framework, just types and terms.

At scale, when you have millions of synthesized functions, you have massive deduplication. The hypergraph doesn't bloat, it becomes a compressed library of all meaningful functions in a domain.

Different teams, different organizations, all querying and contributing to the same hypergraph. Code stops being a filesystem problem, it becomes a graph problem.

The long-term endpoint is worth stating clearly: An environment where AI agents can write, test, and deploy code autonomously, and not require any humans in the loop. Not because humans are unnecessary, but because the type system makes it mechanically sound. If the types are proofs, composition is verified, and synthesis is deterministic.

This is early work, but the direction is clear, and the foundation is good enough to build on.

Building a Hypergraph for AI-Generated Code

Jeremy Tregunna

The Simple Idea

Why SKI combinators and Type-Driven Synthesis

What Exists

The Missing Pieces

Why this Changes Things

Read more

Deep dive: TrueTime

Thoughts: Building Tools that Serve, not Extract

How to Lose Data on Purpose and Still Access It Later

NUMA-Aware Allocation: Making Memory Local Again