Seattle Councilmatic: Boosting Civic Engagement In The Emerald City

May 20

Last summer I was between jobs. Given how tough the job market was in 2025, I knew I needed to learn new skills and find a way to showcase them. I also knew I wanted to build something of substance. Not a tutorial follow-along, not a generic resume-shaped side project: something I'd actually use, that would also be useful to people I share a city with.

I live in Seattle. I follow local politics. My time spent producing Regional Burning Man events instilled in me the values of communal effort, participation, and civic responsibility. I recently joined the board of my neighborhood community council, where I also serve as co-chair of the Events Committee. I knew WHO I wanted to build for, and the WHY was already clear; I just needed to figure out WHAT I was building.

My first idea was to start with a "who represents me?" lookup tool built on Google's Civic Information API. That plan proved short-lived as Google had announced it was sunsetting the Representatives API on April 30, 2025. The third-party replacements I looked at were either expensive, half-built, or both. While I was digging through alternatives, I stumbled into Chicago Councilmatic, a civic tech site by DataMade that lets Chicago residents track legislation, committee meetings, and how their council members vote. It was open source. It was well-maintained. It was clearly built by people who'd done the actual work of understanding the problem. They'd built versions for a few other cities, but not Seattle.

That was the project; I had found my WHAT. Fork Councilmatic, wire it up to Seattle's data sources, incorporate AI where it made sense, ship it. I started in early August 2025, got the foundations laid, and then took a holiday break that quietly turned into a several-month pause. The project came back to life a few weeks ago, after a conversation with my neighborhood community council president about how hard it is to navigate the Seattle Municipal Code lit a spark I couldn't put down. ADHD hyperfocus and Claude Code as a pair-programming partner carried it the rest of the way. I'm pleased to say that seattlecouncilmatic.org went live this month.

This post covers what it does, the principles that shaped it, the engineering behind it, and a few interesting problems I worked through along the way.

What It Does

Seattle is one of the most transparent cities in the country when it comes to local government data. seattle.gov, Seattle Legistar, and the Seattle Channel video archive collectively publish enormous amounts of information about what our council is doing: every bill, every vote, every meeting, every word of the municipal code. The underlying infrastructure is genuinely good. Seattle Councilmatic stands entirely on the shoulders of the city's data publishing work, and the site exists because that data is available in the first place.

What Councilmatic adds is a layer of synthesis and navigation designed for someone who isn't already a policy professional:

A clear profile for every council member, with photo, contact info, committee assignments, office staff, and a plain-language summary of their tenure and the topics they tend to legislate on.
Plain-language bill summaries with impact analysis and a list of key changes, each linked to the specific Seattle Municipal Code sections the bill modifies.
A unified voting record per council member, in one sortable, filterable table, instead of reconstructing it from individual meeting records.
Meeting summaries for Full Council, Council Briefings, and the nine standing committees. A 2 to 4 paragraph overview of what happened, plus per-agenda-item summaries with timestamped jumps back into the Seattle Channel video if you want to verify anything yourself.
A browsable, summarized municipal code. All 7,400+ sections of the SMC, each with its own plain-language summary, tables rendered inline, the full official text always one click away.

That last one took a lot of time to import and parse successfully, and is foundational to understanding active legislation. Being able to follow "this ordinance changes section 23.47A.004" through to "here's what 23.47A.004 actually says, in plain English" is the difference between a bill summary that informs you and one that just makes you feel informed.

The site is currently read-only. No accounts. No comments. No tracking. No newsletter pop-ups. If it's useful to you, you don't need to sign up for it to be useful. If it isn't useful, you don't have to unsubscribe from anything.

What Shaped the Design

A few principles worth naming, because they aren't accidents:

Open source from day one. The code is on GitHub under the MIT license. Civic tools shouldn't be black boxes. Anyone should be able to audit how a "plain-language summary" was generated, or fork the site and run their own version. The whole project also stands on the shoulders of Councilmatic, Open Civic Data, and Pupa, each of which exists because someone before me made the same call.

No surveillance. No analytics that profile users, no tracking pixels, no ads. There is genuinely nothing to monetize here, and I want to keep it that way. If the day comes that I need server donations to keep it running, I'll ask plainly.

Accessible by default. I spent a fair amount of time on this part. If you're building public-interest software, you're building it for a public that includes screen reader users, keyboard navigators, and people whose contrast sensitivity is different from yours. More on the audit work below.

Deferred when uncertain. A few features I deliberately haven't shipped yet, email digests being the big one. The infrastructure to send transactional email to opted-in users responsibly (deliverability, bounce handling, unsubscribe compliance, abuse policy) is a real body of work, and the security surface around user PII is real. I'd rather defer a useful feature than ship a careless version of it.

Code

The full project is on GitHub, MIT licensed. The WORK_LOG.md in the repo is essentially the long-form version of this post, with every decision and postmortem I kept along the way. Issues, PRs, "your link is broken" emails, and "I'd like to fork this for Tacoma" conversations all welcome.

The Stack

Django backend, PostgreSQL + PostGIS, Vite-built React SPA. Caddy for TLS and reverse proxy. Gunicorn for the Django process. A scheduler container that runs a cron stack against the same database. The entire production deployment is one Hetzner CPX21 (3 vCPU, 4 GB RAM) running docker-compose.prod.yml, about $10/month of compute.

Data ingestion uses pupa and Open Civic Data, the same scraping framework upstream Councilmatic is built on. The LLM synthesis layer uses the Anthropic Messages API: Sonnet 4.6, via the Batch API with cached system prompts.

Architecture

The system has three layers: scrape, synthesize, serve.

Scrape:
  Legistar API + seattle.gov + Seattle Channel
    -> pupa scrapers (nightly cron)
    -> PostgreSQL

Synthesize:
  PostgreSQL
    -> LLM pipeline (Batch API + cached prompts)
    -> Anthropic API (Sonnet 4.6 + Batch)
    -> back to PostgreSQL

Serve:
  PostgreSQL
    -> Django + PostGIS
    -> Vite-built React SPA
    -> Caddy + TLS

Key design decisions:

Postgres full-text search, not Elasticsearch or Solr. The original Councilmatic uses Solr. Seattle's entire corpus (bills, sections, meetings, transcripts) fits in a few gigabytes. tsvector + GIN indexes are sufficient and one fewer moving part to operate. I'd reach for a separate search service if I were running multiple cities on one deployment or if FTS quality stopped being good enough; for one city's data, it's overkill.
Single-VPS production deploy. Caddy, gunicorn, PostgreSQL, and the scheduler all run from docker-compose.prod.yml on one Hetzner CPX21. For a civic site with tens of DAU, anything heavier is over-engineered. The whole stack restarts in under a minute. Backups are a pg_dump cron on the host with seven-day rotation.
Django serves the React SPA. The Vite build output is COPY'd into the Python container at build time, and a Django react_app view returns frontend/dist/index.html for any unmatched path. No separate frontend container, no CDN to manage, no CORS dance. Static assets ship with the same image as the application code, so a deploy is atomic across both.
pupa + Open Civic Data for ingestion. Standard tooling in the civic-tech space. Keeping the data model OCD-compatible means existing tooling (DataMade's councilmatic_core, third-party importers, downstream consumers) works out of the box.
Anthropic Batch API for every LLM call. 50% discount versus realtime; latency is acceptable for nightly synthesis; the failure mode (some requests error, others succeed) is the right shape for civic data. We retry individual rows rather than re-running the world.

Project Structure

seattle-councilmatic/

├── Dockerfile # Multi-stage: node frontend build + python app

├── docker-compose.prod.yml # Caddy + app + postgres + scheduler

├── Caddyfile # TLS + reverse proxy + security headers

├── frontend/ # Vite-built React SPA

├── seattle/ # pupa scrapers (bills, events, people, votes)

├── seattle_app/

│ ├── management/commands # extract_bill_text, summarize_legislation,

│ │ # tag_bill_issue_areas, extract_event_transcripts,

│ │ # summarize_events, summarize_smc_sections,

│ │ # summarize_reps, extract_smc_tables, ...

│ ├── services/ # claude_service, bill_text_extractor,

│ │ # event_chunker, parse_smc_pdf

│ └── models.py # BillText, LegislationSummary, EventTranscript,

│ # EventSummary, RepBio, RepSummary,

│ # MunicipalCodeSection, ...

├── reps/ # Rep-facing API + services

└── scripts/ # update_seattle.sh, poll_llm_batches.sh,

# update_reps.sh, backup-db.sh

The separation between seattle/ (scrapers) and seattle_app/ (application layer) is intentional. pupa's scrape loop is its own world: scrapers yield OCD objects to pupa's importer, which writes them to the database. The application layer then enriches that data with LLM-generated content that lives in its own models (BillText, LegislationSummary, EventTranscript, etc.). Keeping the two domains separate means a scraper bug doesn't put the summary pipeline at risk, and a prompt change doesn't require a rescrape.

How the LLM Pipeline Works (The Short Version)

The synthesis layer is the most interesting engineering on the project. Every night at 2 AM Pacific, after the scrapers run, a pipeline submits seven kinds of work to Anthropic:

Bill text extraction. Downloads attachments, runs PDFs and .docx files through extractors, concatenates with [STAFF SUMMARY] / [SIGNED CANONICAL TEXT] markers so the model can tell staff framing apart from canonical legal text.
Event transcript extraction. Pulls Seattle Channel SRT captions for every televised meeting.
Bill issue-area tagging. 1 to 3 tags per bill from a controlled vocabulary of 20 topics.
Bill summaries. summary / impact_analysis / key_changes[] against a structured JSON schema.
Meeting summaries. A meeting overview plus per-agenda-item summaries with chapter timestamps, in a single LLM call.
Rep summaries. 2 to 3 paragraphs synthesizing tenure, committees, sponsorship by issue area, and voting record.
Municipal code section summaries. Run once across all 7,400 sections; refreshed when sections change.

Each command follows a two-phase Batch pattern: first invocation submits the batch and saves the batch ID to state; the next invocation polls, processes results into the database, and clears state. A separate 3 AM cron picks up everything submitted the night before, which gives the 2 AM submission step a clean retry path if it fails transiently.

The system prompt for each call is marked cache_control: ephemeral, so it's cached once per batch and amortized across hundreds or thousands of requests in that batch's window. Combined with the Batch API discount, this gets the steady-state cost down to about $0.10 to $0.30 per day, roughly $3 to $9 per month. The initial backfill (387 bills + 7,400 SMC sections + 94 meetings + 9 rep summaries) ran around $80 total.

A Few Interesting Problems

1. Seattle Channel's auto-captions are deeply, beautifully wrong

The meeting summarizer reads SRT captions from Seattle Channel. Those captions are auto-generated, all-caps, and phonetic. A sample line:

>> GOOD AFTERNOON. THANK YOU FOR COMING TO THE MAY 5, 2026 MEETING. COUNCILMEMBER WRINGE IS HERE, COUNCILMEMBER CAROLINE IS HERE, COUNCILMEMBER BOB IS HERE...

"WRINGE" is Rinck. "CAROLINE" is Carolyn (Hollingsworth). The captions also don't identify speakers; they mark turn changes with >> and nothing else.

The fix has two parts. First, every summary call gets the current council roster (name + seat) passed in as structured context. Second, the system prompt explicitly instructs the model to cross-reference garbled names against the roster and silently correct them:

You will see speaker names that may be garbled by auto-captioning
(e.g. "CAROLINE" for "Carolyn", "WRINGE" for "Rinck"). Cross-reference
against the provided council roster and use the correct spellings
in your summary. Do not call attention to the corrections.

Spot-checked across 18 meetings: every councilmember named by the summaries is spelled correctly, and the vote tallies ("5-3 with Rivera, Saka, and Strauss dissenting") match the actual outcomes. The captions are wrong; the summaries are right.

I considered switching to Whisper for re-transcription. The math didn't work. Whisper would have been about 5x the API cost ($10 backfill vs $2 for SRT extraction), required FFmpeg plus 25 MB-chunk plumbing for OpenAI's file-size limit, and introduced a new vendor. The roster-context mitigation handles 95% of the value at 20% of the cost. I do capture each meeting's MP4 URL in the EventTranscript model, so a Whisper re-transcription path is available if it ever becomes necessary.

2. Composing third-party URLs yourself is a subtle footgun

The transcript extractor's first version composed Seattle Channel URLs from each meeting's videoid:

url = f"{_SC_HOST}/FullCouncil?videoid={videoid}&Mode2=Video"

This works for Full Council meetings. For committee meetings, the real path is /mayor-and-council/city-council/<year>-<committee-slug>. When you hit /FullCouncil?videoid=<X> with a non-Full-Council videoid, Seattle Channel returns 200 OK with the latest Full Council page, silently ignoring the query string.

My first committee run wrote 76 transcripts that all pointed to the same Full Council SRT. I caught it because every persisted transcript came back with the identical character count (79,301).

The fix was to stop composing the URL ourselves. Legistar already embeds the full Seattle Channel URL for each meeting, so the extractor pulls it whole:

# Before: compose from videoid (silently broken on committees)

url = f"{_SC_HOST}/FullCouncil?videoid={videoid}&Mode2=Video"

# After: extract the embedded SC URL from Legistar's MeetingDetail

sc_url = _extract_sc_url(meeting_detail)

Lesson worth keeping: when a third-party site has a "default" route that doesn't 404 on bad parameters, "I'll just compose the URL myself" is a subtle footgun. If the upstream system already gives you the canonical URL, use it.

3. The model overreached on "Budget & Taxes"

I needed bills tagged by issue area so each rep's profile could surface "what topics they sponsor on." First-pass tagging asked Claude to pick 1 to 3 tags from a 20-item vocabulary, no constraints on which ones were "easy" defaults. Result: 51% of bills got Budget & Taxes as a secondary tag, because every contract authorization, fee adjustment, and routine appropriation involves money.

For per-rep aggregation, that meant every councilmember looked like a Budget & Taxes specialist. Councilmember Strauss, chair of Finance, had 78% of his bills tagged Budget & Taxes, which made his profile read as "tax policy expert" when the reality is "the Finance chair signs off on routine fiscal authorizations."

I tightened the prompt to reserve Budget & Taxes for bills that are substantively about budget, taxation, or fiscal policy, not bills where money is incidentally a mechanism:

Reserve "Budget & Taxes" for bills that are substantively about budget allocation, taxation, levies, ballot measures funding city programs, or fiscal policy. Do NOT apply it to bills that merely involve money as a procedural mechanism (contract authorizations, fee adjustments, routine appropriations). Most contract and appropriation bills should be tagged by their substantive topic instead: Utilities, Transportation, Housing, etc.

Re-tagging with --force dropped Budget & Taxes from 51% to 40% of bills. Strauss's profile started reading correctly: "this heavy concentration reflects his role chairing the Finance committee, which routinely processes fiscal authorizations." Two takeaways: vocabulary design for a city-specific corpus should start from the actual committee structure (not generic municipal-policy categories), and prompts for classification tasks benefit from explicit negative guidance about which features look discriminative but aren't.

4. Recovering §23.48.235, Upper-Level Setbacks

The Seattle Municipal Code parser couldn't detect SMC §23.48.235 as a section at all. The number lives only in the page's running header, and the title appears below a figure caption rather than next to the section number. The section's content leaked into 23.48.230's full_text as a 1.8 KB tail.

The recovery path was specific enough to bother documenting:

Find the bleed-marker for 23.48.235 in 23.48.230's full_text.
Slice 23.48.230 at that line.
Create a new MunicipalCodeSection row for 23.48.235 with title='Upper-Level Setbacks', source_pdf_page=3015.
Strip the running headers and broken caption/title fragments that pdfplumber's column-major capture left in the bleed-over.
Splice Map A in at the top of the new section (clean caption found); fall back to appending Exhibits A/B/C before the citation.

Less elegant than I'd like: a one-off rescue rather than a parser fix. But this is the shape of work on a 4,000-page PDF that the city publishes as a single download. Most of the time the parser is right; sometimes the layout is weird enough that surgery is cheaper than a general fix. The parser would have to handle (a) section numbers in running headers, (b) titles separated from numbers by intervening figures, (c) figure captions appearing where titles usually go. Every one of those is a regression risk on the 95% case for ambiguous benefit on the 5%.

5. Vision API for permission tables

Title 23 of the SMC is the zoning code. Many of its sections are organized around big permission tables: "in zone X, you can do Y, conditional on Z." Sections like 23.47A.004 have a master Table A that's the entire substance of the section. pdfplumber's word-level reader, which splits each page at mid_x = page.width / 2, scrambles those tables when they span both columns: cell values randomly assigned to left/right based on column-relative position, producing strings like X X X CCU CCU / P P P P P with no row labels attached.

I tried table-aware extraction with page.find_tables() first. That worked for clean tables but the LUC permission tables have merged cells, multi-line headers, and footnote-bearing rows that pdfplumber's table-finder didn't reliably segment. Then I tried Claude's Vision API: render each page to a PNG, send it to Sonnet with a structured-output prompt asking for markdown tables, splice the result back into the section's full_text. That handled the LUC tables cleanly, including the footnote KEY blocks they reference.

There was one cleanup pass needed: the parser's broken text-dump of the same tables stayed in the section's full_text above the new markdown version. The first cleanup heuristic walked backward through the head text removing lines that matched cell substrings, which worked on simple cases but missed footnote fragments interleaved with the dump. The fix was a structural-marker chop using ^Table [A-Z](?:[-.]\d{1,2})? for$ or ^Footnotes to Table [A-Z] as anchors, verified zero false positives across all 7,429 sections.

Running It

Production runs on a single Hetzner CPX21. Deploy is a docker compose -f docker-compose.prod.yml up -d away on a fresh server, given an .env and a DNS A-record. The Dockerfile is multi-stage: a node:20-alpine frontend-build stage runs npm ci && npm run build, and the Python app stage COPY --from=frontend-builds the dist directory so the image always contains a real frontend/dist/. Caddy handles auto Let's Encrypt for both seattlecouncilmatic.org and www.seattlecouncilmatic.org (apex 308-redirects to www).

Three crons run inside the scheduler container:

Daily 2 AM Pacific. Scrape Legistar, sync to Councilmatic models, extract bill text, extract event transcripts, submit Batch jobs for issue-area tagging and summarization.
Daily 3 AM Pacific. Poll all in-flight Batch jobs; process results into the database.
Weekly Sunday 2:30 AM. Re-scrape rep bios, submit the rep summary batch. Council memberships change rarely (every few years per seat), so daily would be wasteful.

Backups are a host-cron pg_dump to ./backups/ with seven-day rotation. Offsite to a Hetzner Storage Box is a documented follow-up. The full DEPLOY.md runbook covers first-time setup, day-2 ops (deploys, logs, restarts, manual scrapes, restores, secret rotation, TLS renewal), and the POSTGRES_PASSWORD / DATABASE_URL coupling that bites if you change one without the other.

The Accessibility Audit

After the initial UI was usable, I ran a full audit pass with axe DevTools plus Firefox's Accessibility Inspector, on six flagship pages: Home, the legislation index, a bill detail, the municipal code browser, the council member index, and a council member detail page.

The findings doc surfaced issues across labels, focus management, contrast, heading hierarchy, landmarks, live regions, document titles, and Leaflet map keyboard navigation. They were resolved across a sequence of focused PRs, and the accessibility conventions are now documented in AUDIT_FINDINGS.md and applied on every new UI surface.

The whole point of a civic-information site is to lower the activation energy of participation; high activation energy is precisely what we're fighting.

A Note on Claude Code

I'd be hiding the ball if I didn't mention this part. The bulk of the production codebase landed in the last few weeks, in a stretch of ADHD hyperfocus where I used Claude Code heavily as a pair-programming partner. A few things I'd say about it as an honest practitioner report:

What it was good at. Project-wide refactors where I needed dozens of files to change consistently. Implementing a feature where the design and constraints were already clear in my head and I needed it typed up correctly. Writing the tests I would have skipped if I were typing by hand. Drafting the DEPLOY.md, WORK_LOG.md, and audit findings docs from the context of the code that already existed. Diagnosing failures where the right answer required holding 5 files of context in working memory simultaneously.

What it wasn't a substitute for. Knowing what the system should do. Architectural calls (single VPS vs. managed services, Postgres FTS vs. Solr, Batch API vs. realtime, controlled vocabulary vs. free-form tags). Recognizing when a clever-looking diff is actually wrong. Catching the 76-transcripts-all-Full-Council bug, which a tool would have happily kept making.

The right mental model, for me, is that Claude Code is exceptional at executing on a clear engineering vision and terrible at substituting for one. The bottleneck is no longer typing speed or recall; it's judgment, taste, and knowing what good looks like for this specific problem. That's a fine bottleneck to be working against. It's the one I actually enjoy.

I'll write more about the workflow specifically at some point. For now, the takeaway is: if you're building solo on a side project and you're using one of these tools well, the gap between "stalled side project on the shelf" and "shipped open-source production system" is much smaller than it used to be. The work I did in the last few weeks would have taken a small team a quarter ten years ago. Force multiplier, indeed.

What's Next

The backlog is currently empty, but new features and bug fixes will be listed in GitHub issues. Here are some of the potential future enhancements I’m currently dreaming up:

Email digests. Probably the highest-value missing feature: weekly summary of new legislation, upcoming meetings, and your reps' activity. Deferred because the security and operational surface (SPF/DKIM/DMARC, bounce handling, unsubscribe compliance, abuse policy) is meaningful and I didn't want to rush it. This is most likely next.
An interactive AI chatbot. A natural-language interface that can query the database and do deeper reasoning to answer questions the pre-generated LLM summaries might not cover. Still in early design/daydreaming phase.
Historical coverage. The scrapers run on a 548-day rolling window. Bills introduced before that window get logged in vote events but aren't always linkable. Backfilling the pre-window corpus is a finite project I just haven't done yet. Weighing effort to value ratio.
Better SMC table extraction. Vision-API extraction handled the hardest LUC tables, but the harder layout cases (mid-section sandwiches, multi-page table series) still need work. If Seattle posts a new PDF with an updated municipal code, I’ll have to do the parsing dance all over again.
A versioned public API. The internal Django API is read-only and reasonably stable; documenting it behind a versioned endpoint would let other civic projects build on top.
Multi-city, eventually. The architecture is Seattle-specific in the scrapers but city-generic in the data model. Forking for another city, or possibly even King County, would be a real but bounded body of work: Legistar (or equivalent) API conventions, OCD division IDs, and committee structures all vary. If there's interest, I'd love to help brainstorm or bootstrap.

I'm sure I'll receive plenty of feedback and ideas as more people use the site.

The Civic Point

I built this because I believe a more informed and engaged public is how local government gets better. Not faster, not louder, but actually better. The people who weigh in on local policy today are doing real work, often important work, but they're a relatively narrow slice of the population: people whose jobs or vocations have given them the time and tooling to keep up with a city council that meets at 2pm on Tuesday afternoons. The more accessible we can make local politics, the wider the circle of people who can participate in an informed way, the more say we all have in how our city is run.

That circle widens when civic information becomes easier to navigate. A bill summary that doesn't require a law degree. A council member profile that shows what they actually work on. A meeting recap you can read in five minutes instead of a video you'd have to schedule ninety minutes to watch. Each of those is a small thing on its own. Together they lower the activation energy of "what is my city doing?" enough that more people, with more varied lived experience, can actually answer it.

If that's something that resonates with you, give it a try. If you're in Seattle, look up your district. Read a bill summary. Skim a committee meeting. Decide for yourself whether your representatives are doing what you elected them to do.

Then tell your neighbors about it.

Jimmie Lundie https://JimmieWiFi.com