Prerequisites
Make sure you’ve completed Before You Begin, including the two reading assignments: the MinimumCD manifesto and the Greenfield CD guide. This chapter builds directly on that material.
Did you actually read them? Let’s find out with the following quiz.
If any of those quiz answers surprised you, go back and re-read the manifesto and greenfield guide before continuing.
Design Decisions
You’re about to build a full-stack web application in Rust. Before you write a single line of code, you’re going to make every major design decision and understand why each one matters.
This isn’t busywork or a box-checking exercise. The design decisions we make up front are the load-bearing walls of everything that follows. Get them right, and subsequent work flows naturally. Get them wrong, and you’ll spend all of your time fighting your own architecture.
You read the greenfield CD guide in your first reading assignment. Its core message: understand why before building what. Most software projects make their hardest-to-reverse decisions in their first week, when they understand the least. We’re going to be deliberate instead.
By the end of this chapter, you’ll know:
- Why we build the delivery pipeline before the application
- Why a YAML schema file is the single source of truth for every data type in the system
- Why the application serves two interfaces from a single binary
- Why the UI is built from composable, themed components from day one
- Why Rust, and why now
No code yet. Just decisions. Pipeline First is where we start to build.
Why Pipeline-First?
If you are new to CD, we will ask you to do something very counterintuitive in the next chapter. You won’t start by running cargo leptos new. You’ll write a GitHub Actions workflow, Terraform configurations for Linode, and deploy a health-check endpoint to production. The application comes later. The pipeline comes first.
This is the core MinimumCD greenfield principle: the delivery pipeline is feature zero. It’s not something you “set up later when there’s enough code to deploy.” The pipeline shapes the code, not the other way around.
Why? Because building CD into a project from the start costs almost nothing. Retrofitting it later can take months. Every team that says “we’ll add CI/CD after the prototype” ends up with a prototype that’s allergic to automation. The test suite assumes a specific directory structure. The deployment requires SSH and a checklist. The database migrations need to be run by hand.
As the greenfield guide puts it: “Every one of these is trivial to add to an empty project and expensive to retrofit into a mature codebase.” We take that literally.
Feature Zero Validations
The pipeline enforces quality gates from commit one. Ours maps to the Rust toolchain like this:
Formatting: rustfmt. Automatic code formatting, enforced by the pipeline. Not a suggestion, a gate. If your code isn’t formatted, it doesn’t merge. This eliminates an entire category of code review friction and keeps the codebase consistent as it grows.
Linting: clippy. Rust’s linter catches common mistakes, unidiomatic patterns, and potential bugs. Like rustfmt, it’s a pipeline gate, not an optional tool.
Type checking: the Rust compiler. This is a selling point specific to our stack. Rust’s compiler is already stricter than most languages’ entire linting toolchains. If your code compiles, you’ve eliminated null pointer exceptions, data races, use-after-free bugs, and a host of other errors that other languages catch (if at all) at runtime. The pipeline’s type-checking gate is the Rust compiler itself, and it’s doing more work than you’d get from adding three or four tools to a Python or JavaScript project.
Test framework: cargo-nextest. Faster than cargo test, with parallel execution and structured output. Configured from commit one so the test infrastructure is ready before we write our first test.
Security scanning: cargo-audit. Dependency vulnerability scanning. When a CVE is published against a crate in your dependency tree, the pipeline catches it before you deploy. Add it on day one and it costs nothing. Add it on day one hundred and you’re triaging a backlog of vulnerable transitive dependencies that have been in production for months.
Supply chain policy: cargo-deny. License compliance, crate source vetting, and duplicate dependency detection. This is the difference between “we use open source” and “we know exactly what open source we use, where it comes from, and what licenses we’re agreeing to.”
Supply chain vetting: cargo-vet. cargo-deny checks policy; cargo-vet checks provenance. It imports trusted audit sets from organizations like Mozilla, Google, and ISRG, then verifies that every dependency in your tree has been audited by someone you trust. When you add a new crate, cargo-vet tells you whether it’s been reviewed and by whom. New, unvetted dependencies require explicit exemption.
Mutation testing: cargo-mutants. Your tests prove the code works, but do they actually catch bugs? cargo-mutants systematically modifies your code, replacing + with -, deleting function bodies, changing return values, and making other “mutations” while checking whether your tests notice. A surviving mutant is a line of code you can break without any test failing thus revealing a gap in your safety net. The pipeline runs cargo mutants --in-diff on every push, mutation-testing only the changed code. A full mutation sweep runs nightly to catch accumulated gaps.
Automated dependency updates: Dependabot. Configured at project creation. When a dependency publishes a security fix, you get a PR within hours. No human has to remember to check.
GitHub security features. Code scanning (SAST) and secret scanning enabled on the repository. These are free, built into GitHub, and catch categories of mistakes that careful coding alone won’t prevent: accidentally committed API keys, known vulnerability patterns in code, insecure dependency configurations.
Local Hooks Mirror CI
Every check the pipeline runs, you can run locally before you push. The tool that makes this practical is prek, a Rust-native pre-commit hook runner. You define your checks once in .pre-commit-config.yaml, and the same file drives both local hooks (prek run) and the CI pipeline step.
Why bother? Because a 5-second local check is better than a 10-minute CI failure. If rustfmt or clippy catches something before you push, the pipeline stays green. “All feature work stops when the pipeline is red” is a lot easier to follow when the pipeline is rarely red. The hooks aren’t a gate that replaces CI. CI is still the authority. The hooks are a fast feedback loop that keeps you from wasting CI time on problems you could have caught at your desk.
prek reads the same .pre-commit-config.yaml format used by the Python pre-commit framework (the industry standard), but it’s written in Rust, installs with cargo install prek, and runs hooks significantly faster. No Python runtime required.
Shift-Left Security: Two Layers
Security in this project works at two levels. Locally, while you’re writing code, an AI-assisted security review (a Claude Code skill) scans your changes for application-logic vulnerabilities: injection patterns, missing authentication checks, XSS risks, hardcoded secrets. You’re in the loop. You see every finding and decide how to respond.
In the pipeline, deterministic tools do the same work independently: cargo-audit, cargo-deny, Dependabot, GitHub code scanning. These don’t depend on an API call. They run the same way every time. They’re auditable.
The two layers complement each other. The local review catches things static tools miss (like a raw SQL string that looks parameterized but isn’t). The pipeline catches things you might overlook during development (like a transitive dependency with a known CVE). Neither layer alone is sufficient.
The Testing Pyramid
Every test in Trunk to Theory is Rust code. Behavioral specifications are test functions: a test named user_can_pose_research_question() is both the spec and the verification. When the agent reads it, it knows what to implement. When the pipeline runs it, it knows whether the implementation is correct.
The pyramid has nine layers, each serving a different purpose and running at a different speed:
| Layer | Tool | What it tests |
|---|---|---|
| Doc tests | cargo test --doc | API examples in documentation compile and run. If the API changes, the docs break. Spec drift is impossible. |
| Unit tests | cargo-nextest | Service layer logic, validation rules, schema-generated types, pure functions. Fast, no external dependencies. |
| Component tests | dokime | Leptos component rendering, signal reactivity, event handling. No browser required. |
| Integration tests | cargo-nextest + SQLx fixtures | Service layer against a real PostgreSQL database. Catches query bugs, migration issues, transaction edge cases. |
| Contract tests | cargo-nextest | REST API responses conform to the OpenAPI spec. Catches contract drift between the API and its documentation. |
| Security E2E (DAST) | playwright-rust | Probes the running application for injection, XSS, and auth bypass vulnerabilities. Runs against staging. |
| E2E tests | playwright-rust | Full user flows in a real browser. Multi-step interactions, page navigation, data persistence across reloads. |
| Visual regression | theoria + playwright-rust | Component screenshots diffed against baselines. Catches rendering regressions without a dedicated visual testing tool. |
| Mutation testing | cargo-mutants | Tests catch real bugs, not just exercise code paths. Mutants that survive reveal untested behavior. Runs incrementally (--in-diff) on every push; full sweep nightly. |
The fast layers (doc tests, unit tests, component tests) run in milliseconds and give immediate feedback during development. The slow layers (integration, contract, security, E2E, visual regression) run in seconds and are pipeline gates. Mutation testing sits alongside the pipeline: incremental on every push, full sweep nightly. Together they cover the full stack from individual functions to deployed user flows.
The ACD Workflow
Read these two pages only (as always, don’t follow any other links on the site):
- localhost:1313/docs/agentic-cd/ — the ACD overview page
- localhost:1313/docs/agentic-cd/specification/first-class-artifacts/ — the full artifact definitions
Come back here when you’re done.
Did you read both pages? Let’s check.
Agentic Continuous Delivery, the framework you just read about, is the workflow that structures this entire book. ACD extends Continuous Delivery with eight constraints and a set of delivery artifacts that anchor human-agent collaboration. The ACD workflow defines eleven stages:
- Intent Description. A human drafts a problem statement and hypothesis. An agent finds ambiguity and suggests edge cases.
- User-Facing Behavior. The human defines and approves BDD scenarios. The agent generates scenario drafts and finds gaps.
- Feature Description. The human sets constraints and architectural boundaries. The agent suggests architectural considerations and integration points.
- Acceptance Criteria. The human defines thresholds and evaluation design. The agent drafts non-functional criteria and checks cross-artifact consistency.
- Specification Validation. Before implementation begins, the agent reviews all four specification artifacts for conflicts, gaps, and ambiguity. The human gates entry to implementation.
- Test Generation. The agent generates test code from the BDD scenarios, feature description, and acceptance criteria.
- Test Validation. The human reviews the generated tests. Over time, expert validation agents progressively replace human review.
- Implementation. The agent generates production code within a small-batch session per scenario.
- Pipeline Verification. The pipeline runs all tests. All scenarios implemented so far must pass.
- Code Review. The human reviews the implementation. Over time, expert validation agents progressively replace human review.
- Deployment. The pipeline deploys through the same path as every other change.
The key ACD principles governing every chapter:
- Explicit, human-owned intent exists for every change. The human defines the why. The agent helps figure out the how.
- Intent and architecture are represented as delivery artifacts. It exists in the repo under version control and is machine-readable. It is not in the developer’s head or in a separate wiki.
- Consistency between intent, tests, implementation, and architecture is enforced. The pipeline verifies this. It’s not a matter of discipline.
- While the pipeline is red, agents may only generate changes restoring pipeline health. No new features until the build is green.
You’ll execute the full ACD workflow for the first time in Completing the Slice, when we build the “pose a research question” feature end-to-end. But the pipeline and constraints are in place from Pipeline First.
Why Schema-Driven?
In a typical Rust web application, you maintain the data model in three places:
- Rust structs, with
serde::Serialize,serde::Deserialize,sqlx::FromRow, andutoipa::ToSchemaderives stacked on top. - SQL migrations,
CREATE TABLEstatements that must agree with the Rust types. - JSON Schema or OpenAPI definitions, the API contract for external consumers.
These three representations describe the same thing. When you change one, you must change the others. If you forget, nothing catches the inconsistency until a test fails (if you’re lucky) or a production user hits a 500 error (if you’re not).
The typical Rust approach is to maintain separate struct layers (API DTOs, domain models, database row types) connected by From/Into trait implementations. This works. It’s idiomatic. And it depends entirely on developer discipline to stay consistent.
Other ecosystems have partially solved this. Prisma gives TypeScript developers a single .prisma schema that generates types, a query client, and versioned migrations. AWS’s Smithy generates Rust types and API specs from a single .smithy model, but it has no database awareness. SeaORM 2.0 lets you sync Rust entity structs to the database at runtime, but it doesn’t produce versioned migration files or OpenAPI schemas. No existing tool bridges all three layers for a Rust web application.
The Schema as an Architecture Artifact
The ACD framework requires that architecture be represented as versioned delivery artifacts. For data modeling, this means the schema should be the source of truth, not the Rust code, not the SQL, not the OpenAPI spec.
Trunk to Theory’s domain model has two homes. The scientific ontology — the classes, relationships, and constraints that define what a Question, Evidence, Hypothesis, Experiment, and Result are — lives in the scimantic-ontology repo as a versioned LinkML schema. The application data model (users, sessions, app configuration) lives in the main repo. Both are YAML-based, and both flow through the same tool.
Here’s what a core entity looks like in LinkML:
classes:
Question:
attributes:
id:
range: uri
identifier: true
text:
range: string
required: true
status:
range: QuestionStatus
required: true
domain:
range: string
enums:
QuestionStatus:
permissible_values:
open: {}
investigating: {}
resolved: {}
From these schemas, panschema generates:
- Rust structs with the appropriate
serde,sqlx, andutoipaderives. - SQL DDL for database migrations (PostgreSQL tables for app state).
- JSON Schema for API contract validation.
- SHACL shapes for validating RDF data in the knowledge graph.
When the schema changes, the pipeline regenerates all downstream artifacts and verifies consistency. If the generated Rust types don’t match the SQL migrations, the build breaks. If the SHACL shapes don’t match the ontology, the build breaks. If the JSON Schema doesn’t match the API response types, the build breaks. Inconsistency is caught at build time, not in production.
Why Not Just Hand-Write Everything?
You could. Many excellent Rust applications do. But consider the trajectory of this book’s philosophy:
- SQLx already does this for queries: it checks column names and types at compile time. Nobody hand-verifies
SELECTresults against struct fields. - utoipa does the same for API contracts: your OpenAPI spec is generated from Rust types, not maintained in a separate file.
- panschema extends this to the data model itself. The Rust types, SQL DDL, SHACL shapes, and JSON Schema are all generated from one LinkML definition. The pipeline verifies consistency across all of them.
The theme is the same at every layer: catch inconsistencies at build time, not in production. Schema-driven development is a natural extension of the same philosophy that makes SQLx and utoipa compelling. It’s Rust’s build-time verification story applied to the data model itself.
Why This Architecture?
Trunk to Theory needs to serve two kinds of clients:
- A web frontend for desktop and mobile browsers, server-rendered HTML with client-side interactivity.
- A REST API for external consumers — CLI tools, Jupyter notebook integrations, data pipeline scripts, and other research tooling that interacts with the knowledge graph programmatically.
This dual requirement drives the architecture.
The Problem with a Single Interface
Leptos, the full-stack Rust framework we’re using, provides #[server] functions. These let you write code that runs on the server but call it transparently from the client. They’re elegant for the web frontend: you write a function that queries the database, and Leptos handles serializing the result across the client/server boundary.
But Leptos server functions are not REST endpoints. They use Leptos’s own serialization protocol. A Python script running a SPARQL query can’t call them. Neither can any HTTP client that isn’t a Leptos frontend.
We need standard, OpenAPI-documented REST endpoints alongside the Leptos frontend. And both need to agree on data shapes, validation rules, and business logic.
The Single-Binary, Dual-Interface Design
Leptos runs on top of Axum; they share the same server process, Tokio runtime, and router. This means we can serve both interfaces from a single binary:
┌─────────────────────────────────────────────────┐
│ Single Axum Server │
│ │
│ ┌────────────────────┐ ┌───────────────────┐ │
│ │ Leptos Routes │ │ REST API Routes │ │
│ │ (SSR + WASM) │ │ (utoipa OpenAPI) │ │
│ │ server functions │ │ /api/v1/* │ │
│ └─────────┬──────────┘ └─────────┬─────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────────────────────────────────────┐ │
│ │ Shared Service Layer │ │
│ │ (domain logic, validation, auth) │ │
│ └──────────┬────────────────────┬───────────┘ │
│ ▼ ▼ │
│ ┌────────────────────┐ ┌───────────────────┐ │
│ │ SQLx + PostgreSQL │ │ Oxigraph │ │
│ │ (app state) │ │ (knowledge graph) │ │
│ └────────────────────┘ └───────────────────┘ │
└─────────────────────────────────────────────────┘
The key is the shared service layer. All domain logic (validation, authorization, business rules) lives in one place. Both the Leptos server functions and the REST API handlers call into the same service layer. Neither interface implements business logic directly.
Below the service layer, two databases serve different purposes. PostgreSQL stores application state: users, sessions, configuration, and any relational data that doesn’t belong in a graph. Oxigraph, a Rust-native RDF triple store with full SPARQL 1.1 support, stores the knowledge graph: questions, evidence, hypotheses, experiments, results, and all the relationships between them. The service layer abstracts this split. A request to “show me all evidence linked to this question” queries Oxigraph via SPARQL. A request to “update the current user’s profile” queries PostgreSQL via SQLx. The caller doesn’t know or care which database answered.
- Domain logic is tested once. You don’t write separate tests for the web frontend’s “pose a question” logic and the API’s “pose a question” logic. They call the same function.
- Both interfaces are guaranteed to agree on data shapes. The Rust type system enforces this: if the service layer returns a
Question, both interfaces get the sameQuestion. No drift. - The REST API gets OpenAPI documentation generated at compile time from the same Rust types the Leptos frontend uses. The API contract is enforced by the compiler.
- The dual database is invisible to consumers. Whether data lives in PostgreSQL or Oxigraph is a service-layer concern. The API returns the same JSON regardless of which store answered the query.
Technology Choices as CD Constraints
Every technology choice in this book maps back to a Continuous Delivery constraint:
| Choice | CD Constraint |
|---|---|
| PostgreSQL over SQLite | MinimumCD requires production-like environments from day one. SQLite in dev, Postgres in prod is the kind of divergence that hides bugs. |
| Oxigraph (Rust-native RDF store) | The knowledge graph runs in-process, embeds via RocksDB, and supports full SPARQL 1.1. No Java/Python triple store to manage separately. Same binary, same deployment, same pipeline. |
| SQLx (compile-time checked queries) | The pipeline catches database contract violations before deployment. A typo in a column name breaks the build, not the user’s session. |
| Terraform + Linode | Everything-as-code. Infrastructure lives in the same repo and flows through the same pipeline. No snowflake servers, no manual provisioning. |
| utoipa (compile-time OpenAPI) | The API contract is generated from Rust types, not maintained separately. Contract drift is impossible. |
| playwright-rust | E2E testing in Rust. The testing tool is written in the same language as the application. The entire testing story (unit, integration, E2E) is Rust. |
| cargo-mutants | Tests must demonstrate they catch regressions. Code coverage measures what runs; mutation testing measures what’s verified. Incremental on every push; full sweep on a schedule. |
| panschema + LinkML | The data model is a versioned architecture artifact. The pipeline generates and verifies all downstream representations: Rust types, SQL DDL, SHACL shapes, JSON Schema. |
| scimantic-ontology (separate repo) | The domain ontology is a versioned artifact with its own release cycle. The app depends on a pinned version. Ontology changes flow through the pipeline like any other dependency update. |
| cargo-deny | Supply chain policy as code. License compliance, crate source vetting, and duplicate detection are pipeline gates, not afterthoughts. |
| cargo-vet | Supply chain vetting with trusted audit imports. Every dependency verified by someone you trust (Mozilla, Google, ISRG). New, unvetted dependencies require explicit exemption. |
| Dependabot + GitHub Security | Automated dependency updates and static analysis. Security scanning that doesn’t depend on developer memory. |
| prek (pre-commit hooks) | Local hooks mirror CI checks. Catch formatting, lint, and security issues in seconds before pushing, keeping the pipeline green. Rust-native, reads the industry-standard .pre-commit-config.yaml. |
| Podman + compose.yaml | The local PostgreSQL runs in a container matching the production version. No environment divergence. |
| VS Code Devcontainer | One-click setup gives every reader the same environment. No “works on my machine” debugging. |
| Tailwind CSS v4 | Utility-first CSS with a Rust-native standalone CLI. Component styles are Leptos components composing Tailwind utilities. No third-party CSS framework, no Node.js dependency. The Trunk to Theory theme is a versioned artifact. |
| theoria | A component catalog where you browse, configure, and document every UI component in isolation. Ensures components are reusable and well-documented as the project grows. |
| dokime | Fast component-level testing without a full browser. Verifies rendering and signal reactivity for every component theoria catalogs. Catches regressions without the overhead of E2E tests. |
| tracing | Structured logging from day one. The pipeline verifies code is correct at build time; tracing shows what’s happening at runtime. You can’t do canary deployments without observability to compare. |
| thiserror | Domain-specific error types with context. Errors carry enough information to diagnose problems from logs without reproducing them locally. |
Why Component-Driven?
Trunk to Theory has several distinct UI surfaces: question boards where researchers pose and track questions, evidence timelines that link literature and observations to questions, hypothesis trees that visualize how evidence supports or refutes competing explanations, and experiment trackers that follow methodology from design through results. Without a deliberate approach to UI, every chapter would reinvent button styles, form layouts, and spacing. The result would be a codebase where no two pages look the same.
We build the UI from composable Leptos components from the start. A Button component introduced in The Web Frontend is the same Button used in Results & Analysis. A Card component that displays a research question also displays a piece of evidence or an experiment summary. The components are small, reusable, and tested in isolation.
Tailwind CSS v4 provides the styling foundation. Its standalone CLI is written in Rust (using Lightning CSS), so it requires no Node.js runtime. Tailwind scans your Leptos view! macros for class names and generates only the CSS you actually use. We define a Trunk to Theory theme (color palette, typography scale, spacing tokens) in Tailwind’s configuration, then compose those utilities inside Leptos components. A Button component isn’t a CSS class name; it’s a Rust function that encapsulates a specific combination of Tailwind utilities, accepts typed props (variant, size, disabled), and renders consistently everywhere it’s used.
This approach keeps the entire build chain Rust-native and eliminates a third-party CSS framework from the dependency tree. The component styles are yours, defined in your codebase, tested by your pipeline.
For component development, we use theoria (Greek: theoria, “a journey to witness a spectacle”), a Rust-native component catalog. It provides a dedicated route where you can browse every component, configure its props, and see the rendered output in isolation. When you build a Card component that will be used on both the question board and the evidence timeline, theoria lets you develop and refine it independently before it touches either page. You’ll see it introduced in The Web Frontend and used throughout the book.
For component testing, we use dokime (Greek: dokime, “proof by fire”), a Rust-native component testing framework. It verifies rendering, signal reactivity, and event handling for every component theoria catalogs, without booting a full browser. A Button with three variants and two sizes has six prop combinations; dokime tests them all in milliseconds. Together with standard #[test] functions for domain logic and playwright-rust for full E2E flows, dokime covers the middle of the testing pyramid that would otherwise require slow browser-based tests.
theoria and dokime are being built alongside this book. If you’re reading an early draft, their repos may be empty or incomplete (or non-existent!). They’ll be ready by The Web Frontend, where we introduce them.
Why Rust, and Why Now?
Two years ago, recommending Rust for a full-stack web application would have required caveats. The frameworks were young. The ecosystem had gaps. You could build a backend, sure, but “full-stack” meant reaching for JavaScript or TypeScript on the frontend.
That’s no longer true:
- Leptos provides the full-stack story: server-side rendering, WebAssembly hydration, and
#[server]functions that bridge the client/server boundary. It’s pre-1.0 and the API is still settling, but the core model is stable and the community is active. - Axum is the de facto standard backend framework, built by the Tokio team. Its extractor pattern takes getting used to, but once you learn it, routing and request handling are concise. It’s the foundation everything else runs on.
- SQLx provides compile-time checked database queries. Your SQL is verified against the real database schema during
cargo build, not at runtime. A mistyped column name or a type mismatch between your struct and the table stops the build before you can deploy it. - Oxigraph provides an embeddable, Rust-native RDF triple store with full SPARQL 1.1 support. It runs in-process, stores data via RocksDB, and handles Turtle, N-Triples, and JSON-LD serialization. No Java runtime, no separate server process. The knowledge graph is part of the binary.
- utoipa generates OpenAPI specifications at compile time from your Rust types. The API documentation is always correct because it’s generated from the same code that serves the API.
- Cross-browser E2E testing is possible from Rust now, via playwright-rust. Chromium, Firefox, WebKit, driven from
#[test]functions. No JavaScript test runner required. - panschema generates Rust types, SQL, SHACL shapes, and JSON Schema from a single LinkML model. It’s early-stage software; we’ll be extending it as we go, and you’ll see that process firsthand.
But the ecosystem is only half the story. The other half is what Rust’s type system and compiler do for your delivery pipeline.
Rust’s toolchain isn’t simpler than other stacks. You still need a formatter, a linter, a test runner, security scanners, and mutation testing. What’s different is the foundation those tools build on. Null pointer exceptions, data races, type coercion surprises, missing error handling: Rust’s compiler rejects all of these before your code reaches the test suite. Your linter isn’t hunting for null checks you forgot. Your tests aren’t catching type mismatches that slipped through. The compiler has already eliminated those categories, so every tool in the pipeline is working on a stronger base. The result is a more robust pipeline with faster feedback: when a test fails, it’s testing your logic, not catching a bug the compiler should have caught.
And then there’s the convergence that makes this book timely: AI coding assistants are making Rust accessible to more developers, and test-driven development is experiencing a renaissance as the optimal way to work with AI agents. The ACD workflow, where humans define intent and agents generate implementation under pipeline supervision, plays to Rust’s strengths. A strict compiler gives the agent a tighter feedback loop. Code that doesn’t satisfy the types is rejected instantly at build time, not discovered later in review.
What You’re Going to Build
Trunk to Theory is a scientific knowledge management platform. We chose this domain because:
- It’s distinctive. There is no existing “build a scientific knowledge graph in Rust” book. The domain exercises capabilities (ontologies, RDF, SPARQL, dual databases, graph traversal) that a generic CRUD app never touches. If you finish this book, you will have built something no tutorial has covered before.
- It deeply exercises the stack. A knowledge graph demands complex relationships, graph queries, ontology validation, and dual-database coordination. Every layer of the architecture earns its keep. The schema-driven approach isn’t a nice-to-have; it’s essential when your data model is an ontology.
- It decomposes into natural vertical slices. Pose a question. Link evidence. Form a hypothesis. Design an experiment. Record results. Each entity in the scientific workflow is independently deployable, exactly what CD demands.
- The author is a scientist. This isn’t a contrived teaching example. It’s a tool the author will use in their own research. That alignment between dogfooding and career means the domain gets the depth it deserves.
By the end of this book, Trunk to Theory will support:
- Questions. Pose research questions, tag them with scientific domains, and track their status from open through investigating to resolved. Questions are the starting point of every inquiry in the knowledge graph.
- Evidence. Link literature references, datasets, and observations to questions. Evidence accumulates over time, forming the empirical foundation that supports or challenges your thinking. Each piece of evidence carries provenance: where it came from, when it was added, and which questions it addresses.
- Hypotheses. Form testable hypotheses from accumulated evidence. A hypothesis connects to the evidence that motivated it and the questions it aims to answer. The knowledge graph captures these relationships, letting you trace any hypothesis back to its evidentiary roots.
- Experiments. Design experiments to test hypotheses. Track methodology, parameters, and expected outcomes. An experiment is linked to the hypothesis it tests, making the full chain from question to experimental design navigable.
- Results. Record experimental outcomes and link them back to hypotheses. Results either support or refute hypotheses, and the knowledge graph captures this verdict. Over time, the graph becomes a navigable history of your research: which questions led to which hypotheses, which experiments tested them, and what the data showed.
- Knowledge graph. Oxigraph stores the scientific entities and all relationships between them as RDF triples, queryable via SPARQL. PostgreSQL stores application state: users, sessions, and configuration. The dual-database architecture is invisible to the user; the service layer handles routing queries to the right store.
- User authentication. Session-based auth, authorization logic for who can see and edit what.
The first vertical slice, “a user can pose a research question,” takes six chapters. That’s where we build the full stack and complete the first ACD workflow cycle. Evidence, hypotheses, experiments, results, and the knowledge graph queries that tie them together come in later chapters, building on that foundation. Each feature adds faster because the scaffolding (in the codebase and in your understanding) is already in place.
The Scientific Workflow as a Chapter Map
The chapters in Part II follow the scientific workflow itself:
| Chapter | Entity | What the user can do |
|---|---|---|
| Evidence | Evidence | Link literature, data, and observations to questions |
| User Authentication | Users | Log in, own their research, control access |
| Hypotheses | Hypothesis | Form testable hypotheses from accumulated evidence |
| Experiments | Experiment | Design experiments to test hypotheses, track methodology |
| Results & Analysis | Result | Record outcomes, link back to hypotheses, traverse the full graph |
Each chapter adds the next entity, the next set of relationships in the knowledge graph, and the next set of SPARQL queries. By the end, the user can navigate from any question to the experiments that tested it and the results those experiments produced.
Some teams document decisions like the ones in this chapter using Architecture Decision Records (ADRs). These are short markdown files with a Status, Context, Decision, and Consequences section, stored in the repo under docs/adr/. It’s a widely adopted practice (ThoughtWorks has recommended it since 2016) and the format is useful: six months later, nobody remembers why SQLite was rejected, and the ADR captures that. The risk is that ADRs rot. Teams write them enthusiastically for a month, then stop, and stale ADRs mislead more than they help. In this book, the chapters themselves serve as our decision record. Every technology choice has its rationale right here in the narrative. If you adopt ADRs in your own projects, keep them short, update them when decisions change, and delete them when they’re no longer relevant.
The Greenfield Checklist
Here’s what we’re going to build, adapted from the greenfield checklist to our specific stack. This serves as both a preview and a roadmap. The chapter annotation on each item tells you when we’ll get there. At the end of every chapter, we’ll revisit this list and check off what we completed.
Development Environment
- VS Code devcontainer provides a one-click setup with full Rust toolchain (Ch 2)
- Podman (or Docker) is the container runtime (Ch 2)
- PostgreSQL runs in a container via
compose.yaml, matching the production version (Ch 2) - Oxigraph runs embedded in the application binary (no separate container needed) (Ch 2)
Pipeline Basics
- GitHub Actions CI pipeline runs on every push to trunk (Ch 2)
-
cargo leptos buildcompiles, tests, and packages with a single command (Ch 2) - All work integrates to trunk at least daily (Ch 2)
- Deployment to staging is automated via Terraform + GitHub Actions (Ch 2)
- LinkML schema is versioned in the repo; panschema generates types, migrations, and SHACL shapes in CI (Ch 2)
- scimantic-ontology is a pinned dependency; ontology updates flow through the pipeline (Ch 2)
- Structured logging with
tracingfrom the first handler (Ch 2) - Pre-commit hooks mirror CI checks via
prek(Ch 2) - First unit test exists and passes (Ch 3)
Security
-
cargo-auditscans dependencies for known CVEs on every build (Ch 2) -
cargo-denyenforces supply chain policy (licenses, sources, duplicates) (Ch 2) -
cargo-vetvets dependencies against trusted audit sets (Ch 2) - Scheduled weekly security workflow runs cargo-audit, cargo-deny, cargo-vet (Ch 2)
- Release artifacts include SLSA provenance attestation (Ch 2)
- Dependabot is configured for automated dependency update PRs (Ch 2)
- GitHub code scanning (SAST) and secret scanning are enabled (Ch 2)
- Secrets are managed via environment variables and GitHub Secrets, never committed (Ch 2)
- HTTPS/TLS is configured on all deployed environments (Ch 2)
- Local
/security-reviewskill and pre-commit hooks are set up for the application repo (Ch 2) - Security-focused E2E tests (DAST) run against staging after each deployment (Ch 5)
- Secrets rotation is automated for database and API credentials (Ch 12)
Quality Gates
- Pipeline deploys to a production-like staging environment on Linode (Ch 2)
- Rollback is tested and works (Ch 2)
- Application configuration is externalized (environment variables, not baked into the binary) (Ch 2)
- Artifacts are immutable (single binary built once, deployed to staging and production) (Ch 2)
- Doc tests verify all public API examples (Ch 3)
- Unit tests cover service layer and domain logic (cargo-nextest) (Ch 3)
- Integration tests run against real PostgreSQL (SQLx test fixtures) (Ch 3)
- Mutation testing verifies test suite catches regressions (cargo-mutants
--in-diffon every push; full sweep nightly) (Ch 3) - External dependencies use test doubles in the deterministic test suite (Ch 3)
- Domain-specific error types with context (not raw strings) (Ch 3)
- Component tests verify Leptos rendering and reactivity (dokime) (Ch 4)
- Accessibility: WCAG 2.1 AA compliance, semantic HTML, ARIA, keyboard navigation (Ch 4)
- E2E tests verify full user flows in a real browser (playwright-rust) (Ch 5)
- Security E2E tests (DAST) probe staging for vulnerabilities (playwright-rust) (Ch 5)
- Contract tests verify REST API conforms to OpenAPI spec (Ch 6)
- CORS policy is configured for the REST API (Ch 6)
- Input validation enforced at the API boundary (Ch 6)
- API endpoints are paginated with cursor-based pagination (Ch 6)
- HTTP caching headers on read endpoints (Ch 6)
- SPARQL queries against Oxigraph are tested with known graph fixtures (Ch 7)
Production Readiness
- Pipeline deploys to production on Linode (Ch 2)
- Every commit that passes the pipeline is a deployment candidate (Ch 2)
- Deployment is a routine, low-risk event (Ch 5)
- Performance benchmarks run in CI (criterion); regressions block the pipeline (Ch 11)
- Database backups are automated via Terraform (Ch 2)
- CSP headers and rate limiting are configured (Ch 12)
- Database restore process is tested against real data (Ch 12)
- Oxigraph backup and restore is automated alongside PostgreSQL (Ch 12)
- Feature flags decouple deployment from release (Ch 13)
- Load testing establishes baseline capacity and failure modes (Ch 14)
- Observability: tracing spans and metrics support canary comparison (Ch 14)
- DORA metrics are tracked (Ch 14)
Every checkbox is something we will build, in order, through the course of this book. At the end of each chapter, we’ll come back to this list and check off the items we completed. By the final chapter, they’ll all be done.
Next up: we build the pipeline.