SPD's Brinner and Learn Series – Libraries Don’t Run Themselves

Why Are Some Repos Hard to Use?

The path from “repo” to “execution” is usually unclear
“What am I supposed to run?”
“Which files are”active”, and which are historical debris?”
“Do I change code, config, or data to do the next run?”

The Core Problem

Reusable logic is mixed with one-off glue
Execution depends on tribal knowledge
Inputs are implicit and not declared
Reproducing a run becomes a scavenger hunt

One Example Repo, Three Different Kinds of Thing

repo/
  src/project/model.py
  scripts/run_model.py
  model-run --config configs/base.yaml

src/project/model.py is library code: reusable logic
scripts/run_model.py is a script: often one runnable task
model-run ... is the application interface (in this case a CLI)

What Is a Library?

Code that is meant to be imported and reused
Often: Functions, classes, and modules with precise behavior
Should never actually “do anything” just because it is imported
Unit tests should focus here, this is where stable behavior lives

def run_model(records, factors, threshold):
    ...

What Is a Script?

A “executable” wrapper around one task
Often: Parses inputs, calls library code, writes outputs
Should stay thin so the logic is still reusable

def main():
    config = load_config(...)
    records = load_inputs(config)
    results = run_model(records, ...)
    write_outputs(results, config.output_dir)

What Is an Application?

A supported way for a person or system to use the workflow
Often: a CLI, API, scheduled job, pipeline step, or web application
An application still needs an entry point
The entry point is how that application actually gets invoked

What Actually Runs?

python -m package.module
a main() function
a CLI command like model-run --config configs/base.yaml
an API endpoint or scheduled job
a notebook cell, which works, but usually weakly

The Common “Modeling” Pipeline

flowchart LR
    A[Entry point] --> B[Load config]
    B --> C[Load input data]
    C --> D[Call model library]
    D --> E[Write outputs]

flowchart LR
    A[Entry point] --> B[Load config]
    B --> C[Load input data]
    C --> D[Call model library]
    D --> E[Write outputs]

The library only provides logic
The entry point actually assembles the run
The config and input data determine the specific execution

Model Code, Configuration, Data, and Outputs Are Distinct

flowchart TB
    code[Code<br/>Reusable logic]
    config[Config<br/>Run instructions]
    data[Input data<br/>Domain facts]
    output[Outputs<br/>Results of the run]

    code --> run((Run))
    config --> run
    data --> run
    run --> output

flowchart TB
    code[Code<br/>Reusable logic]
    config[Config<br/>Run instructions]
    data[Input data<br/>Domain facts]
    output[Outputs<br/>Results of the run]

    code --> run((Run))
    config --> run
    data --> run
    run --> output

Code dictates how the software behaves
Configuration specifies how the run should execute
Data provides the domain facts for that run
Outputs are what the run produced

Two “Input” Artifacts, Different Intuition

Config file

release_year: 2026
portfolio_path: s3://team-data/portfolios/q1.parquet
threshold: 0.85
output_dir: outputs/2026-q1

Model run configuration
I/O data paths
etc.
Mostly instructions? Probably config.

Input data file

account_id, balance, sector, region
1042, 1500000, cement, NA
1043, 230000, steel, EU

Domain input, facts or records
The “thing” being modeled
Mostly records? Probably data.

What Good Execution Feels Like

One clear supported way to run an important workflow
Inputs are explicit, not hidden in code
A teammate can follow the docs and succeed
A result can be traced back to code, config, and data
boring to explain
boring to rerun
boring to hand off

Common Anti-Patterns, and the Fix They Point To

If you see this…	A good next fix is…
`sys.path` hacks	make imports real and package the code cleanly
hard-coded local paths	move the path into config
`script_2024.py`, `script_2025.py`	drive behavior from inputs instead of duplicated files
notebooks as source of truth	try to move reusable logic into `src/`
`main.py` just because `uv` added it	document and support one real entry point

A “Healthy” Default Repo Shape

repo/
  src/project/
    model.py
    io.py
    validation.py
    app.py
  scripts/
    run_model.py
  configs/
    base.yaml
  notebooks/
    exploratory_analysis.ipynb
  tests/
    test_model.py
  docs/
    running.md

src/ for reusable code
scripts/ for task-focused orchestration
notebooks/ for exploration, not the source of truth
configs/ for declared runs
docs/ for the humans (and some robots)

Tiny Demo Project

The live demo for this talk lives here:

live_demos/2026-03-16_libraries_dont_run_themselves

mini_house/
  src/mini_house/tools.py
  src/mini_house/app.py
  scripts/frame_wall.py
  pyproject.toml

library: a tiny reusable “tool”
script: one part of the work
application: a micro CLI representing the supported interface
same pattern, deliberately stripped down

Three Improvements You Can Make This Week

Move one chunk of reusable logic out of a notebook or script and into src/
Create one real runnable entry point for an important workflow
Replace one hard-coded path or year with a config value
Make it painfully obvious what someone else should run

Bonus: Product Thinking Helps!

Decide who needs to use the thing
Decide how they are supposed to use it
The clearer the user, the clearer the entry point
A repo gets healthier when the supported workflow is designed on purpose
Another developer importing functions needs stable library code
An analyst running a scenario needs one obvious command plus editable config
Is this for another developer to import?
Is this for a teammate to run from the CLI?
What is the smallest successful experience for that user?

Closing Thought

Libraries can’t just run themselves. Good repos make that obvious.

Discussion

What repo in your world would benefit most from this cleanup?
Which anti-pattern shows up most often?
What would make one of your current projects easier to hand off?