The Log is the Database

foldb has one source of truth: an append-only, Raft-replicated log. Everything else, the storage layer, the indexes, the materialized query results, is just a fold(log). A pure function applied to a sequence of committed entries in order.

A transaction commits when its entry is durable in the log on a quorum of replicas. Not when it's been applied to the storage layer. Not when indexes are updated. The append is the commit. Execution is a deterministic consequence of that commit, not the commit itself.

The log is partitioned across 64 (tunable) Raft groups. A sequencer, itself Raft-replicated, assigns global sequence numbers across all partitions: monotonically increasing, gap-free, cluster-wide. One total order, everywhere. If you're wondering whether two nodes could ever assign the same sequence number to different transactions, the answer is no, that's what the sequencer exists to prevent. It's also the earliest bottleneck in the database, you might run into.

The fold executor reads log entries in sequence order and applies them to an LSM tree. One thread per data partition, pinned to a core. No locks. No MVCC inside a partition. There's only one order of events, the log, so there's no concurrency to coordinate. When a transaction touches two partitions, the executors exchange values (the rows each side needs from the other's state at seq - 1) and both apply their slice independently. No locking, no voting. The logic is deterministic and both sides saw the same inputs, so the results are consistent by construction.

Non-determinism ... things like NOW(), RANDOM(), UUID(), are resolved at the gateway before an intent touches the log. By the time an entry is written, every value is concrete. Every node that replays that entry sees the same thing. The gateway is where impurity stops; inside the fold, impurity doesn't compile. The executor's import whitelist is checked at build time. Wall clock reads are a compile error on the execution path.

SQL queries are registered ahead of time and referenced by a BLAKE3 hash of their AST. The hash is what lives in the log. The query is fixed, the params are fixed, non-determinism is preresolved, and storage state at seq - 1 is fixed. Two nodes with identical log state produce identical results. If they don't, one of them has a bug.

Reads can be issued at any past sequence number. State is versioned by seq in the LSM; historical reads aren't a special mode, they're a property of how the storage layer works. The log can be truncated once a snapshot is durable in object storage and all live readers and CDC subscribers are past that point. Recovery from scratch is: download the snapshot manifest, pull the SSTables, replay forward from the snapshot's seq. That's it.

Schema changes are transactions. DDL goes through the log like everything else. There's no metadata side channel. If the schema isn't in the log, it doesn't exist.

There's one isolation level: strict serializable. The isolation level clause is a parse error. The error message will tell you why.

I'll post more about foldb in the future, including where you can get the code. For now, have a good time.

EDIT: The code for foldb has been uploaded to Codeberg at: https://codeberg.org/canoozie/foldb and there is a copy that sync's a couple times a day on GitHub too.

The Log is the Database

Read more

Deep Dive: Predicate Pushdown

Four integers, and a lot of wrong assumptions

The Advisory Layer

Rehearsal