Writing Software with Formal Methods: An Introduction to TLA+

February 4, 2026 · Steve Ma

Author’s Note

Two weeks ago, I was discussing a complex module with one of my colleagues at work, and he brought up the idea of using formal methods to write software—specifically, TLA+^[1]. I had some exposure to this methodology during my time at UW (I think it was in CSE 311 or 331), and it struck me as an interesting idea. I spent the weekend learning and researching TLA+, including modeling some of our existing modules. I found TLA+ to be quite helpful, so I decided to write a blog post on the subject.

This post is really just a collection of my notes and thoughts as I learn—it’s not meant to be authoritative or overly serious, but rather a way for me to think out loud and share what I’ve found interesting so far.

Background

TLA+ is a mathematics-based language for high-level modeling of digital systems, created by Leslie Lamport ^[2] (also known for his work on Paxos ^[3] and LaTeX ^[4]). It is primarily used to describe and reason about the behavior of concurrent and distributed systems.

So why is TLA+ needed? In my opinion (and this was something my college advisor, Mark Friedman ^[5], often emphasized), one of the fundamental challenges in writing highly concurrent software is that the human brain simply isn’t built for it—we evolved to think in a largely single-threaded way. As a result, when designing concurrent systems, it becomes difficult to both (1) model the processing flow (the “happy path”) and (2) anticipate and reason about edge cases (the “unhappy paths”). TLA+ addresses this by giving you a mathematical way to think about your software: you describe the system in a precise, mathematics-based language and then use tooling to exhaustively check that model. In my experience, learning the syntax and mechanics of TLA+ is fairly straightforward; the real challenge lies in learning how to think about your system at a higher level of abstraction.

Basics of TLA+ ^[6]

I won’t copy and paste a bunch of TLA+ syntax here—I’ve linked a TLA+ learning guide in the footer if you’d like to follow along. I also tend to learn best by seeing concrete examples. I’m a Go programmer, so I’m particularly obsessed with the idea of channels/buffers. Let’s use TLA+ to model the simplest possible one: a single-slot buffer.

As I’ve mentioned before, writing the model in TLA+ is usually the easy part; thinking about the system in the abstract is what’s hard. So let’s start by reasoning about what a one-slot buffer actually is.

What can it hold? A one-slot buffer is either empty or contains a value (for simplicity, let’s say it holds a natural number).
What operations does it need? A producer can put a value into the buffer when it’s empty, and a consumer can take a value from the buffer when it’s full.

That’s really all there is to it. With that mental model in place, we can now define it formally.

In general, a TLA+ model looks like this:

---- MODULE XXX ----
EXTENDS Naturals

VARIABLES x

Init ==
    x = 0

Next ==
    x' = x + 1

Spec ==
    Init /\ [][Next]_x
====

Every TLA+ model starts with a module definition. In our case, it will be OneSlotBuffer.
Following the module definition is the EXTENDS keyword, which you can think of as importing standard libraries in a programming language.
Next, you declare the variables that make up the state of your system.
After that, you describe the system’s behavior: its initial state, the transitions it can take (the “next” states), and finally the full specification that ties everything together.

Okay, let’s get back to our example and break the model down into pieces. What variables are needed for a OneSlotBuffer? Naturally, when you model a one-slot buffer, the only state you really care about is whether that slot is empty or full. In our case, that state is represented by a single variable: the buffer itself. Let’s call it buf.

---- MODULE OneSlotBuffer ----
EXTENDS Naturals

VARIABLES buf

Next, let’s think about the initial state of the buffer. When we create a new buffer, it’s most intuitive to start with it empty. So we define:

Init ==
    buf = None

Now, let’s consider the operations our buffer should support. As discussed earlier, we need two operations: put and get. Since we only have a single slot, put is only allowed when the buffer is empty. Likewise, get is only valid when the buffer is not empty. We can express the put operation in TLA+ as follows:

(* Put(v): allowed only when buffer is empty *)
Put(v) ==
    /\ buf = None
    /\ buf' = v

In TLA+, an “operation” is really just a description of a state transition. You should not think of this as an assignment in the imperative sense. Instead, we describe both:

the precondition: buf = None, and
the postcondition: buf' = v.

We can read this definition as: any state transition corresponding to Put(v) must satisfy—

The buffer is currently empty, and
In the next state, the buffer holds v.

Similarly, we define the get operation:

Get ==
    /\ buf # None
    /\ buf' = None

This reads as: any state transition corresponding to Get must satisfy—

The buffer is not empty, and
In the next state, the buffer becomes empty again.

After defining the operations, we can now specify the system’s overall behavior. A complete TLA+ specification is usually composed of an initial state and a next-state relation. We already have the initial state, so all that remains is to define what a valid “next step” looks like.

Given a one-slot buffer, what transitions are possible? At any step, we either put a value into the buffer or take a value out of it. We can express this in TLA+ as follows:

(* Next-state relation: either Put some v, or Get something)
Next ==
    \/ \E v \in Nat : Put(v)
    \/ Get

Let’s break this down if you’re not familiar with the notation:

\/ means “OR” (just as /\ means “AND” above).
\E v \in Nat : Put(v) means “there exists a value v in Nat such that Put(v) holds.”

So the definition reads: a next-state transition is legal if either—

There exists some natural number v for which the step satisfies Put(v), or
The step satisfies Get.

With that in place, we can define the full specification:

Spec ==
    Init /\ [][Next]_buf

There are two important details here:

[Next]_buf is shorthand for Next \/ (buf' = buf), meaning that either a valid Next transition occurs or the buffer remains unchanged.
The [] operator means “always.”

Read together, this says: the system starts in the Init state, and from that point on, every step of the system either follows the Next transition rules or leaves buf unchanged. In other words, all possible behaviors of the system are constrained by the model we just defined.

Validation

A natural next step after defining the specification is to validate it. In TLA+, there are two main kinds of properties you can ask the model checker to verify. The first is an invariant: a property that must hold in every reachable state of your model. Invariants are typically used to express safety guarantees—things that should never go wrong.

The second category is temporal properties, which describe how the system behaves over time—how states relate to one another across a sequence of steps. These are used to express liveness and progress conditions, such as “something good eventually happens.” Temporal properties are more advanced and outside the scope of this article, but they are one of the most powerful aspects of TLA+, and I encourage you to read more about them.

Let’s write an invariant for the one-slot buffer we’ve defined. What is something that should never happen, regardless of which state the buffer is in? A straightforward property to check is that a Get operation should never be possible when the buffer is empty. We can express that in TLA+ as follows:

NoBadGet ==
    (buf = None) => ~(Get)

This reads as: if the buffer is empty in the current state, then a Get transition is not allowed in the next state. In other words, whenever buf = None, the system must not be able to take a Get step. By asking the model checker to verify this invariant, we ensure that there is no reachable state in which the model permits a consumer to read from an empty buffer.

The full specification can be found below.

---- MODULE OneSlotBuffer ----
EXTENDS Naturals

CONSTANTS None
VARIABLES buf

(* Initial state: buffer is empty *)
Init ==
    buf = None

(* Put(v): allowed only when buffer is empty *)
Put(v) ==
    /\ buf = None
    /\ buf' = v

(* Get: allowed only when buffer is full *)
Get ==
    /\ buf # None
    /\ buf' = None

(* Next-state relation: either Put some v, or Get *)
Next ==
    \/ \E v \in Nat : Put(v)
    \/ Get

(* Full system behavior: start in Init, then always take Next steps *)
Spec ==
    Init /\ [][Next]_buf

(*
Safety invariant: you can only Get when not empty.
*)
NoBadGet ==
    (buf = None) => ~(Get)

====

TLC

In the previous section, I referred to a “model checker”—in the TLA+ world, that tool is called TLC. TLC is a model checker for TLA+ specifications that validates system behavior by exhaustively exploring all possible states and checking them against your invariants and temporal properties.

The official TLA+ Toolbox IDE includes built-in support for running TLC directly on a model ^[7]. However, as a VSCode extension developer myself, I strongly recommend using the TLA+ Language Support extension ^[8] for VSCode. It integrates nicely into an existing workflow and makes experimenting with TLA+ feel much more lightweight and approachable.

To keep this article focused, I’ll stop here and leave the deeper details of TLC configuration and usage for a future write-up.

Where I’ve gotten stuck with TLA+

TLA+ has been useful for me, but I’ve also found it surprisingly easy to get stuck—not on syntax, but on the “translation layers” around the spec. Two difficulties I keep running into:

1) Deciding what to abstract (and what not to)

The hardest part is deciding what to model and what to leave out. If I abstract too much, I can “prove” something that isn’t actually true for the real system. If I abstract too little, the spec gets unwieldy and the state space blows up.

2) Mapping a spec to an implementation

A spec can be mathematically proven correct and still be hard to implement. For example, Paxos is proven correct, but it still took me three months to implement.

Footnotes

You can find the official homepage of TLA+ here: https://lamport.azurewebsites.net/tla/tla.html. ↩
Dr. Lamport recently retired from Microsoft Research. Check out his personal website: https://www.lamport.org/. ↩
There are several papers on Paxos, the original being: Lamport, L. (1998). The Part-Time Parliament. ACM Transactions on Computer Systems, 16(2), 133–169. ↩
See https://www.latex-project.org/ for more details. ↩
Mark Friedman is the founder and CTO of Demand Technology Software; a great engineer with a strong focus on performance. ↩
I find this website to be useful for learning the basics of TLA+: https://learntla.com/index.html. ↩
See https://lamport.azurewebsites.net/tla/toolbox.html for more details. ↩
See https://github.com/tlaplus/vscode-tlaplus for more details. ↩