• Spec Mirror: Just as a reminder, I have a full mirror of the Matrix specification available at https://telodendria.io/spec.matrix.org. I very recently updated it to v1.7, and you can also download a tarball of the spec for offline viewing if you want: spec-v1.7.tar.gz.
  • GitHub Mirror: The GitHub mirror is now up to date. It was a few weeks behind because my script broke due to some changes in my .ssh/config. I fixed those so now we're back up and running.
  • Where Code? You may have noticed that I haven't been writing a lot of code lately. Instead, I've been living in the Matrix specification, trying to wrap my mind around it almost every free minute I have. The fruit of my labor is the rest of this newsletter, as I've set out to document my thought process so I don't forget it when I actually go to write the code.

Matrix Homeserver Implementation Part 1: Protocol Overview

In this newsletter, I want to sketch out my implementation of the Matrix specification that I plan on using, both to clarify my own thoughts, and because I am struggling to find resources explaining how Matrix homeservers are actually implemented. It's difficult to read the source code for the existing implementations because they are so large, so their code bases require such a deep understanding in order to even understand the concepts of how they work. Instead of writing all of the code first and just dumping it onto my CVS server, I feel like I can do a better service to the Matrix community by actually explaining more of the concepts than what the specification does so that the implementation becomes a lot easier to deduce for oneself. For me, writing code is the easy part once I understand a topic. So, without further ado, let's get into the technical details of the Matrix specification.

A Room in Matrix terminology is simply a directed acyclic graph (DAG) of Events. An Event is a—mostly—self-contained unit of data which contains a few pieces of information required by the homeserver, but otherwise has few restrictions on what can be contained in it. The meaning of an event is derived entirely by the client; the server uses a few pieces to maintain the relationships between and authentication chain of the events in a room, but otherwise does not care what an event contains.

Events are related to each other both chronologically, and by how they authorize each other to exist in a room. In a sense, a Room is actually two DAGs created out of the same set of events; one in which the edges of the DAG express a chronological relationship, where events are connected by an edge to the events that came before them, and the other where the edges express a state relationship, where events are connected by an edge to the events that authorize them to exist in the graph.

A Matrix homeserver traverses each of these DAGs primarily to perform the following two tasks:

  • State Resolution
  • Linearization

State Resolution is the process by which the homeserver traverses a room's DAG to determine what the correct State of the room is at any given point in time, namely, at any given Event in the room. As we will see in a moment, this is actually a rather complicated process due to the distributed and decentralized nature of the protocol.

Linearization is the process by which the homeserver traverses a room's DAG and organizes events into a stream of events, one after another, for the client to process. While I have yet to find the details of this process in the Matrix specification, my current understanding is that this is just a topological sort, which is also used in state resolution.

The Matrix specification defines State as "a map of (event_type, state_key) to event_id." In other words, State is a collection of key-value pairs where the key is the event type and a special state key, and the value is an event ID. State is used to describe a room, including giving it a name, topic, and aliases, as well as recording what users are a part of the room. It is also used to authorize events, because the State describes who has permissions to send what types of events.

A room's State is updated when a State Event is sent to the room. Each state event must have a type and a state key, which is typically a blank string, indicating that it applies to the entire room, but in some specific cases is the ID of a user to indicate that the state only applies to that user, such as when a user joins or leaves a room. When the state event is sent, it updates the room state at that point in time to make its type and state key point to itself. The state event describes the changes in state to be applied. This is how Matrix stores all the information about a room.

State resolution algorithms are necessary because Matrix is a decentralized protocol where every homeserver that is participating in a room has its own copy of the room. The room does not belong to any one homeserver, rather all homeservers contribute to it equally. This can become problematic if the homeservers are unable to communicate with each other quickly enough, because it is likely that clients will still be sending messages to the server, and the server will still apply them to the room. This means that the room's state can potentially be updated independently on each homeserver participating in the room, so when the homeservers are able to sync up, they need some way to arrive at a consistent state.

All of this information is easily derived from the Matrix specification itself. But what is less clear is how to go about actually implementing all this in code. Matrix is a very complex protocol with a lot of algorithms and data structures. These are explained in the specification with a practical degree of precision, but what is not explained is how to actually store the data and when to run the algorithms on it. Both of these are implementation details left to the developers of homeservers, as they should be.

What follows is by no means the only way to implement the specification, I'm sure, but it is the way I have gone about it in my mind as a homeserver developer, though I must admit that I have looked at very little code while thinking through this, and I have yet to write any of the code that implements this process. Nevertheless, I hope I can offer some clarification for those that want to understand how Matrix homeservers actually do the things that the specification says they need to do.

The most important thing to figure out is what a homeserver actually needs to store in the database apart from room state. As far as I have deduced, this is essentially just a list of events at the bottom—or top, depending on how you want to look at it—of each room's event DAG. I tend to want to call these the leaves, although the DAG is not exactly a tree so I am not sure if the term really applies, but either way, the homeserver needs to be able to keep track of all the tips of the forks in the room. Ideally there will only be one tip most of the time, but it's possible that there may be multiple when federation occurs. The homeserver records these events not only for traversal purposes, as required for state resolution and linearization, but also to know what the previous events are when a client sends an event. In order to be able to effectively attach client events to the DAG properly, we must have a list of the most recent events.

The algorithm for maintaining this list is simple; when an event arrives at the homeserver from another homeserver, it iterates over all of the events it specifies to be its previous events, and removes them from the list if they are present. It then adds the incoming event to the list. This works because it ensures that no "leaves" are lost, and no leaves that are no longer leaves remain in the list. Perhaps a visual representation is necessary to effectively demonstrate this concept.

Consider a Room with events E1, E2, and E3 that looks like this:

E1
|
E2
|
E3

Now consider that event E4 arrives and specifies E2 as its previous event because the homeserver that E4 originated on was not aware of E3 at the time. The DAG now looks like this:

E1
|
E2---+
|    |
E3   E4

When E4 arrives, E2 is not in the list of most recent events, so nothing is removed from it. E4 is then added to the list, because it is now a leaf. This covers the situation in which a fork is created. Now, consider that event E5 arrives at the server and specifies E3 as its previous event. The DAG now looks like this:

E1
|
E2---+
|    |
E3   E4
|
E5

Since E5 specifies E3 as its previous event, E3 is removed from the list and E5 is added. E4 is still in the list. Now lets look at what happens when an event E6 arrives that resolves the fork by specifying both E4 and E5 as its previous events. The DAG now looks like this:

E1
|
E2---+
|    |
E3   E4
|    |
E5   |
|    |
E6---+

Since both E4 and E5 were specified as previous events, they are both removed from the room's stored list and then E6 is added. The fork has been resolved.

When a client sends an event to the homeserver, it specifies no previous events. The homeserver must determine what they are for itself. Since the homeserver keeps a list, it simply copies this list to the client event. If there is more than one previous event, the client's event resolves the forks as shown in the previous diagram.

Homeservers may also find it desirable to cache the state of the room at each event so that it does not have to continuously re-run the state resolution algorithm whenever the state is needed. As far as I know, all homeservers do this, but it is not strictly required by the specification, and it would be advantageous if the homeserver implementation could cope with previous states missing from the cache by simply recalculating them as necessary.

Other than the DAG's current leaves and the state at each event, a Matrix homeserver has to store nothing else for a room. Everything that clients and other servers will ask for can be derived from the resolution of the state at each of the current leaves, or the state at a particular event. This even includes the room version, which is derived simply by opening up the event at the m.room.create key in the room state and checking the version there.

Now that we know what we are storing, we can move on to how we actually ingress events. The Matrix specification is unclear as to what is supposed to happen when events are received at the API endpoints. There are two ways an event can arrive. It can arrive via the federation API from another homeserver, or it can arrive from a client. Both are handled slightly differently.

When a homeserver receives an event from another homeserver, the event already has a list of previous events, so the homeserver should perform the following tasks in order when an event arrives:

  1. Calculate the content hash as described in the specification's appendices. If the room for which the event has arrived is version 3 or newer, this hash is used to generate the event ID. In all room versions, the hash should be verified against the hash provided in the event itself.
  2. Perform a size check as described in the client-server API portion of the specification and reject the event if it is too large.
  3. Execute the state resolution algorithm on the event to get the state of the room at the time it was sent.
  4. Perform an auth check by verifying that the events that the incoming event says authorize it actually do. Reject the event if the auth check fails.
  5. Update the room's list of previous events as described above.
  6. Store the event itself.
  7. Optionally, cache the resolved state for quick reference in the future.

When a homeserver receives an event from a client, the event is not fully formed and ready to be added to the room, so the procedure is a little different:

  1. Copy the room's previous events list to the event.
  2. Execute the state resolution algorithm on the event to get the state of the room at the time the event was sent.
  3. Use the resolved state to generate the event's list of authorization events. This involves checking all state events in the resolved state that would—but don't necessarily—authorize the event to be in the room. This includes power level events and room membership events.
  4. Compute the content hash and add it to the event.
  5. Perform the size check and reject the event if it is too large.
  6. Perform the same auth check as you would for events incoming through the federation API.
  7. Update the room's list of previous events as described above.
  8. Store the event itself.
  9. Optionally, cache the resolved state.

As you can see, there are a few differences in the order that things must happen in depending on how the event arrives. Note that the size check is performed so late in the process because the size limit stated in the specification depends on all parts of the event being present, so we have to generate those parts first.

The process for ingressing events should now be clearer, although the state resolution algorithm is still a black box. It is important to note that the algorithm is not static; it depends entirely on what version of room we are dealing with. The Matrix specification defines many room versions each with their own nuances, and room version 2 introduces an entirely new state resolution algorithm. There are currently two state resolution algorithms that have to be implemented by a homeserver. It is not enough only to implement the latest algorithm because many rooms exist in the Matrix community that still use the old algorithm.

The state resolution algorithms are thoroughly documented in the Matrix specification for each room version, however these algorithms can be a little cryptic at first glance. I hope to explain the state resolution algorithms in terms of practical implementation in future newsletters, so stay tuned for those. I will detail how to derive the inputs for the state resolution algorithm, what the output means, and then how to actually go about implementing the algorithms.

That's All!

I hope you enjoy these types of posts. They're really more for me than anyone else, because it's helpful to me to write all my thoughts down, but I certainly hope they are beneficial to others as well. If I missed something, feel free to let me know in #telodendria-general:bancino.net!

As always, if you like my work and you want to support it, consider sending a few dollars to me using the links at https://telodendria.io.

Previous Post Next Post