Elsa was the design system for Disney ABC Television's product organization. Five themes, seven applications, three years of maintenance. I built and maintained the component library and tokens infrastructure that the consuming product teams used to ship features faster with a consistent design language.
Specifics on what shipped on top of Elsa stay confidential — most of the consuming applications were under NDA. The system itself, and what it taught me, is the part worth walking through.
What it is
A React component library backed by a SASS-based token layer, theme infrastructure, and a Figma source-of-truth. Five distinct themes for five distinct surfaces — same components, different brand identities. Theming was a first-class concern, not bolted on after the fact.
The dual-role model
Elsa was built in a single role that held both UX/UI design and engineering. That worked for two reasons. Design decisions got stress-tested against implementation reality in real time — a proposal that "felt right in Figma" but couldn't be expressed in tokens got rejected before it hit production. Engineering decisions got stress-tested against design intent — an API change that broke visual semantics got caught before it shipped to consuming teams.
Nothing got lost in handoff because there wasn't one.
Theme architecture
The token layer was the most consequential decision. Five themes meant token names had to be semantic (color-action-primary) not literal (color-blue-500). When ABC's brand updated, the theme file changed; the components didn't. When a consuming team needed a new theme variant, they wrote a token file, not a fork.
That single decision is what made Elsa survive five themes. Most design systems collapse under their second theme because the tokens were named for the values, not the intent.
Testing as a gate
React Testing Library tests shipped with every component. Behavior, accessibility, edge cases — covered before the component was considered shippable. Consuming teams trusted the library because the regressions they could imagine had already been tested for.
Why this matters for AI systems
The handoff problem in design is the same handoff problem in AI. A prompt lives in one system. An eval harness lives in another. A production trace lives in a third. Nothing shares a schema, so debugging a regression means reconciling three different mental models.
Elsa was about collapsing the design-to-code loop to zero. The equivalent for AI is collapsing the prompt-to-eval-to-production loop — same tokens, same evals, same observability, from the moment a change is proposed to the moment it ships. The discipline transfers directly.