Arbor Design System
Methodology

Why token review gates can't be symmetric

Design tokens flow both ways between Figma and code. The review gates that protect them look like mirror images but solve different problems — and treating them as one shape is how you lose primitives in production.

A design token is one of the smallest objects in a design system — a name bound to a value. color-green-700 resolves to #43E660. spacing-2 resolves to 8px. That smallness is misleading. A primitive token sits at the bottom of every component, in every product, on every screen. A wrong value ripples outward at the speed of CSS.

So tokens carry more political weight than their size suggests. Two questions matter more than the actual hex codes:

  1. Who is allowed to change a token?
  2. What happens when a change is wrong?

This article is about Arbor’s answers to both, and the path that led to them.


Before: no review in either direction

The first version of Arbor’s token sync was deliberately simple. A push from Figma fired a webhook, the relay Lambda dispatched a GitHub Action, a PR opened, a designer or engineer eyeballed the diff in GitHub, and it merged. A merge in code triggered the inverse: a push to Figma. There was no human review at either gate, no dashboard, no audit trail.

The simplicity collapsed within the first month. Two failure modes did the damage:

  • A Figma collection rename silently halved the token count. The sync wrote out a tree that didn’t match what any consumer expected. Nothing alerted. Frontyard apps started rendering with missing colors before anyone noticed the PR.
  • A code-side experiment shipped to Figma and rewrote a primitive every designer was using. Code believed it owned the token. Figma believed it owned the token. The last writer won.

The fix wasn’t to harden either pipeline in isolation. It was to add a review gate to each direction — and to treat the two gates as architecturally different problems.


After: directional gates, not symmetric ones

The new system has two independent gates that look superficially symmetric but solve different problems.

FIGMA → CODE (visual-diff review):
  Figma change → Supernova (visual review + approval)
  → Pipeline webhook → Lambda → sync-figma-tokens.yml → PR
  → Engineer reviews → merge → npm publish

CODE → FIGMA (manual approval):
  Code merge to main → dry-run diff + Slack notification
  → Designer reviews in dashboard → Approve Push button
  → Lambda /dispatch → push-tokens-to-figma.yml → Figma updated

ICONS (unchanged):
  Figma change → Lambda webhook → sync-figma-icons.yml → PR

FALLBACK (if Supernova is down):
  Set SUPERNOVA_ENABLED=false → Lambda reverts to direct dispatch

The Figma → code gate is a visual problem. A designer needs to see what changed in pixels, not in JSON. Supernova sits between the publish event and the GitHub dispatch, rendering a diff that a designer can read.

The code → Figma gate is an intent problem. The engineer who merged the code change already had review on the JSON. What matters at the Figma gate is “did the designer who owns this token agree to this push?” The dashboard’s Approve Push button is that affirmation — explicit, attributable, and required.

Token authority model

Treating the two gates as different problems lets us split ownership cleanly:

Tier Owner Sync direction
Primitives Figma (designer) Figma → Code: automatic. Code → Figma: requires approval
Semantic Figma (designer) Same as primitives
Component Code (engineer) Code → Figma: automatic. Figma → Code: manual only
LOCAL_ONLY Code only No Figma sync (protected by sync-config.js)

The asymmetry maps to where the editorial work actually happens. Designers compose primitive and semantic tokens in Figma; engineers compose component tokens in code. Sync direction follows authorship.


When sync fails: three latencies, on purpose

A sync pipeline that’s silently broken is worse than one that’s loud. Arbor exposes failure at three different surfaces, each tuned to a different audience and a different latency budget.

Surface Who sees it first Latency What it tells you
Dashboard sync strip Anyone visiting arbor.linktr.ee/changelog/ ~5 min on tracked-label PRs; up to 6h otherwise Red / yellow / green / unknown + link to the failing run
Slackbot sync status DM Anyone who DMs the bot Live (or 60s cache) Same data + recent failures list + Figma marker state
#arbor CloudWatch alarm post Everyone in #arbor Up to 1h (alarm evaluation) Automatic post when SyncFailed > 0

The three surfaces are intentionally redundant. The dashboard answers “is something broken right now?” for someone who’s already looking. The DM answers “why is my Figma publish stuck?” for the designer who just pushed and is waiting. The CloudWatch alarm answers “did something break that I didn’t notice?” for the team as a whole. Each is the cheapest tool for its question.

Five named failure modes anchor the diagnosis:

  1. validate:tokens failure — sync wrote token JSON, but references don’t resolve. Draft PR with validator report. Engineer first.
  2. Schema-validation failure — sync produced output that doesn’t match the JSON schema for a primitive domain. Workflow log has the named diff. Designer first.
  3. Token-count guard breach — token count changed by more than ±10%. PR comment lists before/after. Both, cheap to resolve together.
  4. SyncFailed CloudWatch alarm — infrastructure-level failure (dispatch_error, secrets_fetch_error, etc.). #arbor post. Engineer.
  5. Silent sync — webhook never reached the Lambda. Nothing visible. Engineer first.

The failure-mode taxonomy isn’t a runbook. It’s a map of who-pings-whom — codified so the first response to a red dashboard doesn’t require a synchronous huddle. The diagnostic surface that’s load-bearing is the naming: each mode has a unique title, a unique log signature, and a unique escalation path.


The dashboard data pipeline is a design choice

The governance dashboard at arbor.linktr.ee/changelog/ looks like it queries live infrastructure. It doesn’t. It reads pre-built static JSON files generated by GitHub Actions and shipped to GitHub Pages alongside the dashboard bundle.

The choice was deliberate:

  • No auth surface. Live queries against GitHub, Figma, and AWS would each need a token. Static JSON has none.
  • No partial failures. A live dashboard that can render with two of four data sources missing is one that lies to operators about the state of the system. Static JSON either builds completely or doesn’t deploy.
  • Refresh on event, not on poll. Six extraction scripts run in parallel on every relevant pull_request event (~5 min latency), on every push to main that touches tokens, and every 6 hours via cron as a safety net. The dashboard is never more than 6 hours stale, and usually less than 5 minutes.

Trade-off accepted: the dashboard can’t show this moment — there’s always some latency between an event and the next build. The <SyncHealthStrip /> makes that latency visible inline ("refreshed about 4m ago"), and a stale-data banner appears past 2 hours so an operator can’t mistake old data for current state.


What this means in Arbor

Three threads run through Arbor’s token governance work, and they recur in every other governance decision the project has made.

Review gates are directional. Symmetric pipelines without symmetric problems are over-engineered. Treat each direction as its own design space, ask what kind of review actually catches errors at that boundary, and build only that.

Failure visibility is layered, not centralized. A single status page is a single point of attention. Three surfaces with overlapping data, each cheap for its specific audience, costs more to build but absorbs more of the team’s actual diagnostic load.

Static data is a feature. Live data is a liability when the audience is “operators who need a true answer.” Trade freshness for completeness whenever the underlying events are minute-scale and the dashboard’s role is decision support rather than monitoring.

The token governance system is the canonical example of these patterns in Arbor — every later piece of governance infrastructure (the sync-status DM, the Slack alarm split, the Roadmap Radar weekly post) is a variation on the same three threads.


Source: This article condenses and reframes docs/architecture/token-governance.md, which remains the canonical operational reference. Quick-start runbook and rollback procedures live there.