A super mini Excel backend

excel

I don’t know why exactly, but the other day I got curious about how Excel’s backend might work at a high level.
So, I decided to mess around and code a tiny version of it, and it actually got pretty interesting pretty quickly.

Here’s a short write-up to give you a peek behind the curtain.

The Core Data Unit - Cells

Every cell can hold:

a number (10, 3.14)
a boolean (true, false)
a string ("hello")
a or an expression (like =A1 + B2 * 2)

Each cell looks like this:

type Cell struct {
    Raw string // what the user typed
    Val any // the actual evaluated result
}

When we set a cell, we store the raw input, but if it starts with =, that means it's an expression that we'll need to parse and evaluate later

Lexical Analysis (aka Tokenization)

First things first, we need to break the input into pieces. This is called lexical analysis, or tokenization

Input: =A1 + 3 * b2

Tokenizer output:

[A1] [+] [3] [*] [(] [B2] [-] [1] [)]

Each token has a type (identifier, number, operator, parenthesis) That's lexical analysis, basically chopping the string into tokens

Parsing - Building the AST out of Tokens

Now we'll parse those tokens into an abstract syntax tree (AST) that respects operator precedence

For example:

=A1 + 3 * (B2 - 1)

The AST would look like this:

Notice how * binds tigher than +, and parentheses change the evaluation order. That's operator precedence

The parser in the code does this in multiple layers:

- parseExpression() handles `+` and `-`
- parseMulDiv() handles `*` and `/`
- parsePrimary() handles literals, parenthees, or cell references

That way A1 + 3 * 2 becomes A1 + (3 * 2)

Evaluating

Once we have a tree, evaluating is simple recursion:

if it's a number -> just return it
if it's a cell reference -> look up that cell's current value
if it's an operator -> evaluate left and right, then combine

But... what happens when one cell depends on another?

Cell	Formula	Value
A1	`10`	10
A2	`5`	5
A3	`=A1 + A2 * 2`	20

When you change A1 to 20, A3 should now become 30. How do we know that A3 depends on A3?

Dependency Tracking

We track two graphs:

- `deps[cell]` -> all the cells that this **this cell depends on**
- `revDeps[cell]` -> all the cells that **depend on this cell**

When we parse a formula like for cell A3: =A1 + A2 * 2, we extract its dependencies (A1, A2) and record them

For example, for A3 = A1 + A2 * 3 we store:

deps[A3] = {A1, A2}
revDeps[A1] = {A3}
revDeps[A2] = {A3}

Then if A1 changes, we know we must re-evaluate A3 (and any cells depending on A3)

Recalculation - Topological Sort

The thing is, we can't just recalculate arbitrarily all those cells. We must process cells from the least dependent to the most dependent

For example

- *A3* depends on both *A2* and *A1*
- *A2* depends on *A1*

Dependency graph:

`A1 -> A2 -> A3`

If A1 changes, we must evaluate

`A1 -> A2 -> A3`

And we achieve this by running a topological sort (classic graph algorithm)

- Nodes: `A1, A2, A3`
- Edges: `A1 -> A2`, `A2 -> A3`

Topological sort returns:

`[A1, A2, A3]`

That's the order in which we evaluate cells safely

Eager Recalculation

I chose to use an eager update strategy, for simplicity, when a cell changes:

We first mark it as dirty
Collect all dependents (via revDeps)
Run topological sort
Re-evaluate everything in order

In Conlusion

We tokenize to make sense of raw formulas
We parse tokens into a structure that respects operator precedence
We evaluate by looking up cell values recursively
We track dependencies between cells
We recalculate efficiently using topological sorting

GitHub