Libraries Don’t Run Themselves

Jackson Hoffart

Why Are Some Repos Hard to Use?

  • The path from “repo” to “execution” is usually unclear
  • “What am I supposed to run?”
  • “Which files are”active”, and which are historical debris?”
  • “Do I change code, config, or data to do the next run?”

The Core Problem

  • Reusable logic is mixed with one-off glue
  • Execution depends on tribal knowledge
  • Inputs are implicit and not declared
  • Reproducing a run becomes a scavenger hunt

One Example Repo, Three Different Kinds of Thing

repo/
  src/project/model.py
  scripts/run_model.py
  model-run --config configs/base.yaml
  • src/project/model.py is library code: reusable logic
  • scripts/run_model.py is a script: often one runnable task
  • model-run ... is the application interface (in this case a CLI)

What Is a Library?

  • Code that is meant to be imported and reused
  • Often: Functions, classes, and modules with precise behavior
  • Should never actually “do anything” just because it is imported
  • Unit tests should focus here, this is where stable behavior lives
def run_model(records, factors, threshold):
    ...

What Is a Script?

  • A “executable” wrapper around one task
  • Often: Parses inputs, calls library code, writes outputs
  • Should stay thin so the logic is still reusable
def main():
    config = load_config(...)
    records = load_inputs(config)
    results = run_model(records, ...)
    write_outputs(results, config.output_dir)

What Is an Application?

  • A supported way for a person or system to use the workflow
  • Often: a CLI, API, scheduled job, pipeline step, or web application
  • An application still needs an entry point
  • The entry point is how that application actually gets invoked

What Actually Runs?

  • python -m package.module
  • a main() function
  • a CLI command like model-run --config configs/base.yaml
  • an API endpoint or scheduled job
  • a notebook cell, which works, but usually weakly

The Common “Modeling” Pipeline

flowchart LR
    A[Entry point] --> B[Load config]
    B --> C[Load input data]
    C --> D[Call model library]
    D --> E[Write outputs]

flowchart LR
    A[Entry point] --> B[Load config]
    B --> C[Load input data]
    C --> D[Call model library]
    D --> E[Write outputs]

  • The library only provides logic
  • The entry point actually assembles the run
  • The config and input data determine the specific execution

Model Code, Configuration, Data, and Outputs Are Distinct

flowchart TB
    code[Code<br/>Reusable logic]
    config[Config<br/>Run instructions]
    data[Input data<br/>Domain facts]
    output[Outputs<br/>Results of the run]

    code --> run((Run))
    config --> run
    data --> run
    run --> output

flowchart TB
    code[Code<br/>Reusable logic]
    config[Config<br/>Run instructions]
    data[Input data<br/>Domain facts]
    output[Outputs<br/>Results of the run]

    code --> run((Run))
    config --> run
    data --> run
    run --> output

  • Code dictates how the software behaves
  • Configuration specifies how the run should execute
  • Data provides the domain facts for that run
  • Outputs are what the run produced

Two “Input” Artifacts, Different Intuition

Config file

release_year: 2026
portfolio_path: s3://team-data/portfolios/q1.parquet
threshold: 0.85
output_dir: outputs/2026-q1
  • Model run configuration
  • I/O data paths
  • etc.
  • Mostly instructions? Probably config.

Input data file

account_id, balance, sector, region
1042, 1500000, cement, NA
1043, 230000, steel, EU
  • Domain input, facts or records
  • The “thing” being modeled
  • Mostly records? Probably data.

What Good Execution Feels Like

  • One clear supported way to run an important workflow
  • Inputs are explicit, not hidden in code
  • A teammate can follow the docs and succeed
  • A result can be traced back to code, config, and data
  • boring to explain
  • boring to rerun
  • boring to hand off

Common Anti-Patterns, and the Fix They Point To

If you see this… A good next fix is…
sys.path hacks make imports real and package the code cleanly
hard-coded local paths move the path into config
script_2024.py, script_2025.py drive behavior from inputs instead of duplicated files
notebooks as source of truth try to move reusable logic into src/
main.py just because uv added it document and support one real entry point

A “Healthy” Default Repo Shape

repo/
  src/project/
    model.py
    io.py
    validation.py
    app.py
  scripts/
    run_model.py
  configs/
    base.yaml
  notebooks/
    exploratory_analysis.ipynb
  tests/
    test_model.py
  docs/
    running.md
  • src/ for reusable code
  • scripts/ for task-focused orchestration
  • notebooks/ for exploration, not the source of truth
  • configs/ for declared runs
  • docs/ for the humans (and some robots)

Tiny Demo Project

The live demo for this talk lives here:

live_demos/2026-03-16_libraries_dont_run_themselves

mini_house/
  src/mini_house/tools.py
  src/mini_house/app.py
  scripts/frame_wall.py
  pyproject.toml
  • library: a tiny reusable “tool”
  • script: one part of the work
  • application: a micro CLI representing the supported interface
  • same pattern, deliberately stripped down

Three Improvements You Can Make This Week

  1. Move one chunk of reusable logic out of a notebook or script and into src/
  2. Create one real runnable entry point for an important workflow
  3. Replace one hard-coded path or year with a config value
  4. Make it painfully obvious what someone else should run

Bonus: Product Thinking Helps!

  • Decide who needs to use the thing
  • Decide how they are supposed to use it
  • The clearer the user, the clearer the entry point
  • A repo gets healthier when the supported workflow is designed on purpose
  • Another developer importing functions needs stable library code
  • An analyst running a scenario needs one obvious command plus editable config
  • Is this for another developer to import?
  • Is this for a teammate to run from the CLI?
  • What is the smallest successful experience for that user?

Closing Thought

Libraries can’t just run themselves. Good repos make that obvious.

Discussion

  • What repo in your world would benefit most from this cleanup?
  • Which anti-pattern shows up most often?
  • What would make one of your current projects easier to hand off?