Design System Ops – Claude Code Skills for Design Systems Practitioners

The work nobody built AI for

There are great AI tools for the designer who uses a design system. Generate a component, write a story, suggest a layout. That work is well-served.

But what about the team running it?

The token audits. The deprecation plans. The stakeholder briefs, the contribution workflows, the drift you fight quietly before it becomes someone else’s emergency. The governance documentation nobody reads until something breaks. The onboarding that happens informally because there’s no time to do it properly. The budget defence you prepare at 11pm before a quarterly review.

That work is hard, it’s largely invisible, and it compounds badly when it doesn’t get done. It also never had proper AI tooling – until now.

Design System Ops is built specifically for practitioners. Not designers who use a system. The people responsible for one.

What it is

Design System Ops is a Claude Code skill pack. It teaches Claude how to do design systems work the way a staff-level practitioner would: with structured processes, expert frameworks, and output calibrated to the actual complexity of what you're dealing with.

When you ask Claude to audit your tokens, it does not give you generic advice. It reads your actual token files, identifies tier leakage, flags naming violations, produces a prioritised finding table with remediation guidance.

That is the difference. Not a smarter prompt. A different kind of output entirely.

The skills

Triage

Start here.

triage

Scans your system’s size, stack, token maturity, and documentation state. Classifies it into one of four states – new, growing, established, or legacy – and produces a prioritised run plan: which 3–5 skills to run first, which to skip, and which to return to later. The answer to “I just installed this, where do I start?”

Audit

Understand what you actually have.

token-audit

Reads your token files – JSON, SCSS, CSS custom properties, Tailwind config, whatever you’ve got – and produces a structured report: naming violations, missing semantic tiers, wrong-tier references, DTCG alignment gaps, and architectural debt. Not a list of suggestions. A prioritised finding set with severity, location, and recommended fix.

component-audit

Inventories your component library with usage signals, duplication analysis, and coverage gaps. Counts what you have, flags what’s redundant, identifies what’s missing, and scores AI-readiness across the library. Produces a component-by-component assessment, not a summary paragraph.

system-health

Scores your design system across 7 dimensions – tokens, components, documentation, adoption, governance, AI readiness, and platform maturity – calibrated to your system’s maturity level. A Level 2 system isn’t scored against Level 4 expectations.

drift-detection

Finds where consuming teams have diverged from the system and classifies why. Five drift types: intentional divergence, version lag, accidental drift, misunderstanding, and system gap. Each finding includes the team, the component, the divergence, and the classification – so you know whether to fix the system, fix the team, or accept the difference.

naming-audit

Audits naming conventions across components, tokens, and patterns. Checks for consistency, semantic clarity, and convention violations. Catches the kind of naming drift that compounds silently – btn vs button, color-primary vs brand-primary, modal vs dialog – and produces a unified recommendation.

Govern

Run the system as infrastructure.

contribution-workflow

Produces a complete 6-stage contribution process: proposal, design, build, documentation, community review, and release. Includes templates for each stage, capacity calibration based on team size, and clear criteria for what gets accepted vs. what gets sent back. Not a philosophy document. An operational workflow.

deprecation-process

Produces a full deprecation plan: blast radius analysis showing every consuming team and component affected, migration path with code examples, communication timeline, and sunset date recommendation. Tells you who will break, how badly, and what to give them instead.

decision-record

Captures architectural decisions with the context that produced them. Problem statement, options considered, trade-offs evaluated, decision made, and consequences accepted. The kind of record that prevents the next team from re-litigating the same decision in six months because nobody wrote down why.

change-communication

Produces the full communication package for a design system change: release notes, migration guide, and team announcement – all scaled to impact level. A minor token rename gets a changelog entry. A breaking component API change gets a migration guide with before/after code examples and a timeline.

Document

Make the system legible to humans and machines.

ai-component-description

Generates a six-section component description optimised for Figma’s MCP server and LLM consumption. Not a README – a machine-readable specification that tells AI tooling what the component does, when to use it, what props it accepts, and how it composes. Includes diagnostic mode: if the component lacks sufficient metadata to document well, produces a remediation brief instead of thin documentation.

pattern-documentation

Documents recurring UI patterns – not individual components, but the composed behaviours they form. State coverage, accessibility considerations, composition rules, and related patterns. Includes diagnostic mode for patterns that aren’t mature enough to document yet.

token-documentation

Documents token intent, not just token values. Organised by tier – primitive, semantic, component – with theming contracts, usage context, and do/don’t examples. Includes diagnostic mode for token systems that are too incomplete or inconsistent to document meaningfully.

usage-guidelines

Writes component usage documentation: when to use it, when not to, edge cases, and anti-patterns. Anti-patterns derived from actual misuse evidence in the codebase, not hypothetical scenarios. Includes diagnostic mode for components that lack sufficient usage data to produce meaningful guidelines.

Validate

Verify quality before it ships.

design-to-code-check

Compares your design specification against its code implementation across 5 dimensions: visual fidelity, layout structure, responsive behaviour, interaction states, and token usage. Classifies every discrepancy – implementation error, spec gap, system inconsistency, or accepted divergence. Dual scoring separates how well the check ran from how well the code matches the design.

accessibility-per-component

Audits a component across five dimensions: keyboard navigation, screen reader announcement, colour contrast, focus management, and ARIA implementation. Maps each finding to specific WCAG 2.1 AA criteria with severity and remediation. Dual scoring separates how thorough the audit was from how accessible the component actually is.

token-compliance

Scans your codebase for hardcoded values that should be tokens, wrong-tier token references, and inconsistent token application. Every finding includes the file, line, current value, and recommended token. Dual scoring separates how thoroughly the scan ran from how compliant the codebase actually is.

Communicate

Move people and decisions.

adoption-report

Separates coverage from adoption – having the system available is not the same as teams actually using it. Flags at-risk teams, identifies adoption blockers, and tracks trend direction. For first-run reports, includes a baseline establishment protocol: metrics infrastructure audit, tracking recommendations, baseline snapshot, and target setting.

stakeholder-brief

Produces a one-page brief in business language. Takes design system health data – token debt, drift metrics, adoption numbers – and translates it into language leadership understands: risk, cost, velocity impact, and recommended investment. No jargon. No component names. Business consequences.

system-pitch

Builds the investment case for a design system. Cost estimation, ROI framing, efficiency projections, and pre-built responses to the objections you’ll actually hear: “can’t we just use a component library?”, “our teams are too different”, “we tried this before.” Not a slide deck. The argument underneath one.

designer-onboarding

Produces a two-week onboarding guide for a designer joining a team that uses the design system. Essential reading list, quick reference card, first-week tasks, and common mistakes to avoid. Calibrated to the system’s actual complexity – a 6-component system gets a different onboarding than a 60-component system.

Proof

What the output actually looks like

All real outputs from real codebases.

Agents

Workflows that chain skills into end-to-end pipelines.

/component-to-release

Runs a component through the full pre-release validation pipeline: design-to-code alignment, accessibility audit, and token compliance – in sequence, with gates between each stage. If a stage fails critically, the pipeline stops. Produces a consolidated release readiness report with dual scores at every gate.

/full-diagnostic

Comprehensive health sweep chaining 5 audit skills with cross-skill pattern synthesis. Runs token-audit, component-audit, naming-audit, drift-detection, and system-health, then synthesises findings across all five into a unified diagnostic with themes, priorities, and recommended next steps.

/governance-review

Quarterly governance package combining adoption-report, drift-detection, and stakeholder-brief into a single review cycle. Produces both the internal assessment (what’s happening) and the external brief (what leadership needs to know).

/migration

Plans and executes a token migration end-to-end. Chains token-audit and naming-audit to diagnose scope, produces a transformation table for every affected token, generates codemods with test and rollback commands, then builds a four-phase rollout strategy – non-breaking additions, consumer migration, deprecation, removal – with entry criteria, exit criteria, and rollback checkpoints for each phase. Finishes by chaining change-communication to produce the full notification package for every audience. Covers format migrations, naming overhauls, tool upgrades, and composite transformations.

Who this is for

Design System Ops is built for practitioners. The people responsible for a design system, not just the people who use one.

A good fit if you are:

✓ A design systems lead or senior design engineer maintaining a production system
✓ Someone who needs to produce governance documentation, stakeholder reports, or deprecation plans regularly
✓ A team of one managing a system that has more consumers than contributors
✓ Someone migrating a token architecture and needs both the plan and the codemods
✓ Building AI-native design system infrastructure and need structured context files for agents

Not a good fit if you are:

× Looking for a component generator or UI builder
× New to design systems and looking for an introduction to the concepts
× Working in a team where Claude Code is not part of the toolchain

Not sure?

Is it right for you?

Answer 6 questions. Get a personalised skill map in 60 seconds.

FAQ

Clone the repo or download the zip from GitHub. Copy the skills/, commands/, and knowledge-notes/ folders into your Claude Code skills directory at ~/.claude/skills/. That is it. Skills are available in every Claude Code session from then on. Full setup options – including git clone, the Claude Code CLI installer, and the one-click plugin file – are in the repo’s install guide.

Yes. All 21 skills work without a Figma connection. Connecting a Figma MCP server unlocks additional capabilities for skills that work with your Figma library, but it is entirely optional.

Any stack Claude Code can read. It has been tested against React, Vue, Twig/Fractal, Emotion, Tailwind, Style Dictionary v4, plain SCSS, and TypeScript token objects.

Yes. Design System Ops ships as a .plugin file that installs directly in the Claude desktop app’s Cowork mode. Download design-system-ops.plugin from the repo’s installable/ folder and open it – the skills, commands, and knowledge notes install automatically. You can also use it in Claude Code on the command line. Same skills, same output, different interface.

Yes. Design System Ops is free and open source under the MIT licence. No paywall, no gated features, no email required. The full repo is on GitHub.

No. Skills activate automatically when you describe a task in natural language. The slash commands are a convenience layer, not a requirement.

ID	Severity	Category	Finding
TA-01	🔴 Critical	Value	9 component tokens reference primitives directly, bypassing the semantic tier. `--helix-card-border` resolves to `var(--helix-gray-200)` instead of routing through `var(--helix-border-default)`. Same pattern in toast, dialog, and tooltip border tokens. This means theme switching breaks these components – the primitive value does not change between themes, but the semantic token does.
TA-02	🔴 Critical	Value	Feedback colour tokens are disconnected from the feedback system. `--helix-toast-success-bg` hardcodes `#DCFCE7` instead of referencing `var(--helix-feedback-success-subtle)`. The semantic layer already defines these roles correctly – the component tokens just do not use them. 4 toast variants and 3 badge variants affected.
TA-03	🟠 High	Naming	Semantic colour tokens mix intent-based and appearance-based names. `--helix-action-primary` (intent) coexists with `--helix-blue-surface` (appearance). The blue-surface token should be `--helix-surface-accent` or `--helix-surface-brand` – describing what it is for, not what colour it is. 8 tokens use colour-based names at the semantic tier.
TA-04	🟠 High	Coverage	No semantic tokens exist for interactive states. Hover, focus, active, and disabled colours are defined at the component tier but skip the semantic tier entirely. `--helix-button-hover-bg` resolves directly to a primitive (`--helix-blue-600`). This forces every new component to reinvent state colours instead of referencing a shared semantic set.
TA-05	🟠 High	Naming	Spacing tokens use `px` suffix in names. `--helix-space-4px`, `--helix-space-8px`, `--helix-space-16px` – the suffix duplicates what the value already expresses. If the base unit ever changes (e.g., moving to rem), the names become lies. Standard practice: `--helix-space-1`, `--helix-space-2`, `--helix-space-4` (multiplier-based) or `--helix-space-xs`, `--helix-space-sm`, `--helix-space-md` (T-shirt).
TA-06	🟡 Medium	Value	3 alias chains are 4 hops deep. `--helix-dialog-overlay-bg` → `--helix-surface-overlay` → `--helix-neutral-overlay` → `--helix-black` → `#000000`. Each hop adds fragility and makes debugging harder. 3 hops is the practical maximum – compress to 3 by removing the redundant `--helix-neutral-overlay` layer.
TA-07	🟡 Medium	Coverage	Typography semantic tokens exist for size and weight but not for line-height or letter-spacing. Components set `line-height: 1.5` and `letter-spacing: -0.01em` as raw values. These should be tokenised at the semantic tier (`--helix-leading-normal`, `--helix-tracking-tight`) so typography can be adjusted systematically.
TA-08	🟡 Medium	Naming	Component token prefix is inconsistent. Button tokens use `--helix-button-`, card tokens use `--helix-card-`, but toast tokens use `--helix-cmp-toast-` and dialog uses `--helix-ui-dialog-`. Pick one convention and migrate. The most common pattern (used by 7 of 10 component token files) is `--helix-{component}-*`.
TA-09	⚪ Low	Coverage	No token exists for focus ring offset. Components use `outline-offset: 2px` as a raw value in 14 places. Create `--helix-focus-ring-offset` as a primitive and reference it from a semantic `--helix-focus-offset` token.
TA-10	⚪ Low	Naming	Z-index tokens use raw values as names. `--helix-z-100`, `--helix-z-500`, `--helix-z-1000` – workable but brittle if a new layer needs to be inserted between existing values. Consider named layers: `--helix-z-dropdown`, `--helix-z-modal`, `--helix-z-toast`.
TA-11	⚪ Low	Value	2 orphaned primitive tokens are defined but never referenced. `--helix-purple-50` and `--helix-purple-100` exist in the colour primitive file but are not referenced by any semantic or component token. If these colours are unused, remove them to keep the primitive tier clean.

Claude Code skills for the work that keeps a design system alive.

The work nobody built AI for

What Design System Ops is

The 21 skills

Triage

Audit

Govern

Document

Validate

Communicate

What the output actually looks like

Is it right for you?

Sample output: Token audit

Token audit report

Tier structure

Findings

DTCG 2025.10 compatibility assessment

Remediation priority

Scope

Sample output: AI component description

Figma component description

PURPOSE

PROPS

ANTI-PATTERNS

COMPOSITION

ACCESSIBILITY

EXAMPLES

Self-test result

Is Design System Ops right for you?