The Address Space Manager: A Deep Dive

Jeremy Tregunna

06 Jan 2026 — 8 min read

I've been knee-deep in building a distributed storage system and wanted to talk about a component that doesn't get much love in the literature: The address space manager. It's one of those pieces that sits quietly in the middle of everything, just doing its job so you never wonder about it, until it goes wrong.

The Problem We're Solving

Imagine you have a distributed storage system with multiple physical storage devices; could be SSDs, could be CXL-attached persistent memory, could be anything. Your application wants to read and write bytes at logical addresses like it's one big contiguous chunk of storage. Simple, right?

Except, it's not simple at all. Those bytes need to live on some physical medium. And what if you add a new device to the cluster, or need to remove one for maintenance? The application shouldn't care, it just wants to read from address 0xdce8000, 1000 bytes and get the right data back.

That's what an address space manager does: It maintains the mapping between what applications think they're talking to and where the data actually lives.

If you understand virtual memory, this should feel familiar. Your CPUs MMU translates virtual addresses to physical page frames. We're doing something similar, just at a different layer of the stack.

The Core Abstraction: Segments

The fundamental unit here is a segment. A segment is a contiguous range of logical addresses that maps to a contiguous range on a specific physical device. Think of it like a page table entry, but for storage.

class Segment:
  def __init__(self, lgb, loge, devid, phys_offset):
    self.logical_start = lgb
    self.logical_end = lge
    self.device_id = devid
    self.physical_offset = phys_offset
    self.flags = SegmentFlags()

A few things to note here. First off, it's Python code because it's common and short enough for a blog post; you probably don't want to see a lot of Zig or Rust in a blog post. Next, the logical_end is exclusive, which makes the math cleaner when calculating sizes and checking containment. This is the half-open interval pattern that you see everywhere in systems programming literature. If you've ever dealt with off-by-one errors in range calculations, you know why this convention exists.

The flags track operational state:

class SegmentFlags:
  def __init__(self):
    self.active = True
    self.migrating = False
    self.read_only = False

You may wonder why not just use an enum here, then have a single flag for migrating and read_only, but because migration is a process with multiple phases, it's not an exclusive relationship--the source remains active for reads while you copy data to the target, and only after the migration completes do you flip the active flag and deactivate the source. More on this later.

Segment Operations

Before I get into resolution, there are some helper functions that will serve us well as we go, handling common operations that every segment needs, and are often in the hot path.

def size(self):
  return self.logical_end - self.logical_start

def contains(self, offset):
  return self.logical_start <= offset < logical_end

def overlaps(self, start, end):
  return start < self.logical_end and end > self.logical_start

The overlaps helper is worth pausing on. With half-open intervals, two ranges overlap if and only if each starts before the other ends. This is one of those patterns that looks obvious once you see it, but getting it wrong causes sutble bugs that only manifest with specific offset combinations.

As an aside, I've seen systems where someone wrote start <= self.logical_end instead of start < self.logical_end, and it worked fine until there were two segments that were exactly adjacent. Then suddenly reads at the boundary would match both segments. If you think debugging is fun, you've probably never run into this sort of case.

The Resolution Algorithm

The most common operation in a mapping function, is the actual resolution algorithm. It happens on every read and every write. It might look something like this:

def resolve(self, offset, length):
  """Translate logical address range to physical locations.
     Returns a list of PhysicalRange objects."""

  if offset >= self.logical_capacity:
    raise OutOfBoundsError()

  end_offset = offset + length
  if end_offset > self.logical_capacity:
    raise RangeExceedsCapacityError()

  ranges = []
  current_offset = offset
  for segment in self.segments:
    if not segment.flags.active:
      continue

    if segment.overlaps(offset, end_offset):
      range_start = max(current_offset, segment.logical_start)
      range_end   = min(end_offset, segment.logical_end)

      if range_start >= range_end:
        continue

      segment_offset = range_start - segment.logical_start
      range_length   = range_end - range_start
      ranges.append(PhysicalRange(
        device_id=segment.device_id,
        offset=segment.physical_offset + segment_offset,
        length=range_length
      ))

      current_offset = range_end
      if current_offset >= end_offset:
        break

  if not ranges:
    raise NoSegmentFoundError()

  return ranges

Notice that this resolution function can return multiple physical ranges. If a read spans two segments (say, crossing from one device to another), you get back two ranges that you need to read from separately. The caller has to handle this, and take whatever action it makes sense for it.

You do need to be careful about:

Ranges that span segment boundaries
Segments that aren't active (migrations in progress, etc.)
Gaps in the logical address space (sparse allocation)

The linear scan through segments might make those of you who are computer science students, or early on in your career... I can hear you now, "O(n) per lookup? Really?" First off, shut up, secondly, fair point. In practice, you can use a more sophisticated data structure here, say an interval tree... but for systems with a reasonable number of segments, (hundreds, not millions), n is small, and so who gives a shit? It's easier to reason about.

Tirade, over, let's continue.

Concurrency, Getting the Locks Right

Here's where real systems get interesting... As you can imagine, in a distributed storage system, reads are happening all the time, many concurrently, and writes to the segment table only happen when you're adding/removing devices or migrating data. For simplicity, let's use a lock.

class AddressSpaceManager:
  def __init__(self):
    self.lock = RWLock()
    self.segments = []
    self.devices = []
    self.logical_capacity = 0

  def resolve(self, offset, length):
    with self.lock.read():
      # ... resolution logic

  def allocate_segment(self, device_id, size):
    with self.lock.write():
      # ... allocation logic

Why a reader-writer lock specifically? Because the access pattern leans this way. Resolution happens constantly, potentially thousands of times a second. Segment allocation? Maybe a few times a day. If you use a regular mutex, every resolution blocks every other resolution, which will waste time and IOPS. With an RWLock, readers can be concurrent, writers will acquire exclusive access when needed.

So resolution only blocks when the segment table is being updated.

But there is still something hiding in the weeds that we need to know about: When you call resolve(), you get back a list of PhysicalRange objects. But what happens if someone starts a migration right after you release the read lock? You're now holding references to physical locations that might be getting copied somewhere else!

The truth is, not our problem. It's the caller's responsibility to complete the I/O operation before the migration completes. In practice, this works because:

Migrations are rare and take time to copy gigabytes of data
Individual I/O operations are fast, in the microsecond to millisecond range typcially
The migration process waits for in-flight I/O to drain before finalizing

If you ever find yourself building something like this, you could add a reference counting scheme to tackle in-flight operations. Or you might decide the race condition is acceptable given your access patterns. There are no right answers, because it's a tradeoff, and engineering is all about tradeoffs, after all.

Device Management

Before we can allocate segments, we need to know about the devices in our system. The address space manager maintains a registry of devices with their capacities:

def add_device(self, device_id, capacity, numa_node=-1):
  with self.lock.write():
    for device in self.devices:
      if device.device_id == device_id:
        raise DeviceAlreadyExistsError()

    self.devices.append(DeviceInfo(
      device_id=device_id,
      capacity=capacity,
      numa_node=numa_node
    ))

The numa_node parameter hints at something important to real systems: Not all devices are equal distance from all CPUs. A NUMA-aware allocator would try and place data close to the threads that access it most frequently, but that's a whole other can of worms to explore... one day.

Removing a device is a bit more involved. You can't just delete it from the list, there might be segments living on it:

def remove_device(self, device_id):
  self.lock.write():
    for segment in self.segments:
      if segment.device_id == device_id and segment.flags.active:
        raise DeviceHasActiveSegmentsError()

    self.devices = [d for d in self.devices if d.device_id != device_id]

This is where the migration logic comes in. If you want to decommission a device, you first migrate all its segments to other devices, then remove it. The DeviceHasActiveSegmentsErroracts as a safety interlock, preventing accidental data loss.

Migration, It Must Be Done

Adding capacity is easy. You register a new device, allocate a segment on it, and the new logical address space is immediately available. Boom, done.

Removing capacity? Well, that's where it gets tricky.

You can't just yank a device out, there could be data on it! You need to move it first, and in case it's not clear, this is what a migration is; in our case, it's got multiple states it must go through, so let's dig in.

Migration begins; on the source: active=true, migrating=true, read_only=true. The target is: active=false, and migrating=true, and read_only=false
Data is then copied from source to target. During this time, reads still go to the source, no new writes to the source, and bytes get copied in chunks.
Migration completes, and we must set source: active=false, migrating=false; target: active=true, migrating=false, atomically.

The key thing to remember in this migration, is that source and target segments cover the same logical address range. During migration, they coexist in the segment table. The source is active (for reads) but read-only. The target is inactive (not participating in resolution) but being filled with data.

Let's see some code:

def begin_migration(self, offset, target_dev):
  with self.lock.write():
    source_idx = self.find_segment_containing(offset)
    if source_idx = None
      raise SegmentNotFoundError()

    source = self.segments[source_idx]
    if source.flags.migrating:
      raise MigrationInProgressError()

    target_capacity = self.get_device_capacity(target_dev)
    target_used = self.calculate_device_used(target_dev)
    if target_used + source.size() > target_capacity:
      raise InsufficientCapacityError()

    source.flags.migrating = True
    source.flags.read_only = True
    target = Segment(
      logical_start=source.logical_start,
      logical_end=source.logical_end,
      device_id=target_dev,
      phys_offset=target_used,
    )
    target.flags.active = False
    target.flags.migrating = True

    self.segments.append(target)

    return MigrationHandle(source_idx, target_dev, len(self.segments) - 1)

The actual data copy happens outside the lock, orchestrated by whoever holds the MigrationHandle. This is important! We don't want to stop the world for address resolution until we've copied gigabytes of data from host to host. Blocking all I/O is not acceptable.

Once the copy is complete:

def complete_migration(self, handle):
  with self.lock.write():
    source = self.segments[handle.source_segment_index]
    target = self.segments[handle.target_segment_index]

    target.flags.active = True
    target.flags.migrating = False
    source.flags.active = False
    source.flags.migrating = False

After this, any new resolve() calls will find the target segment instead of the source. The source is now a zombie; it can be garbage collected once you're sure no in-flight operations are still using it.

House Cleaning: Things I Glossed Over

While this post presents a functional address space manager, it's not suited for production. Here are some things I glossed over:

Persistence: If the system crashes, you need to recover the segments table. In the real implementation I've built in Zig, segment metadata changes get logged to a log before taking effect. On recovery, you replay the log and reconstruct the segment table.
O(log n) lookups: The linear scan works fine for small segment counts, but if you're managing thousands of segments, you want an interval tree or similar datastructure. A red-black tree where each node stores the maximum endpoint in its subtree gives you O(log n) point queries and O(log n + k) range queries where k is the number of overlapping segments.
NUMA awareness: When you have multiple memory devices, some are closer to certain CPUs than others. A smart allocator would try to place segments on devices that are local to the threads accessing them. This gets particularly important with CXL-attached memory where remote accesses might add 100+ nanoseconds of latency, best case.
Fragmentation: After enough migrations, you might end up with a Swiss cheese of allocated and free space on each device. The segment model as presented assumes dense packing on each device, but in reality you need free space tracking and potentially compaction within devices.
Sparse address spaces: The current design allocates segments contiguously at the end of the logical address space. A more sophisticated system might support gaps, allocating at arbitrary offsets and tracking free ranges. This is useful if you want to reserve address space for future expansion, or if you're exposing the logical addresses to applications that expect specific layouts.
Observability: For load balancing and capacity planning, you'll want to know how full each device is and how hot each segment is. The implementation has hooks for this, but the actual metrics collection and decision-making is a whole separate subsystem with its own complexity.

Wrapping Up

The address space manager is one of the many types of software that align with this truism: The hard part isn't making it work, it's making it work correctly. In our case today, that means concurrent reads, failures, and operational changes.

Next time, I might dig into the erasure coding layer that sits beneath this, and that's where things get really fun.

The Address Space Manager: A Deep Dive

Jeremy Tregunna

The Problem We're Solving

The Core Abstraction: Segments

Segment Operations

The Resolution Algorithm

Concurrency, Getting the Locks Right

Device Management

Migration, It Must Be Done

House Cleaning: Things I Glossed Over

Wrapping Up

Read more

Deep dive: TrueTime

Thoughts: Building Tools that Serve, not Extract

How to Lose Data on Purpose and Still Access It Later

NUMA-Aware Allocation: Making Memory Local Again