Why this case study matters
Apache Airflow is the canonical Python-ecosystem expression of the
alint problem: a sprawling, hand-written shell-and-Python pipeline
validating structural invariants nobody can enumerate from a single
file. 109 pre-commit hooks across 14 repo blocks, of which ~80 are
repo: local shell-outs to Python scripts under scripts/ci/prek/
(124 scripts in that dir, with extras in breeze + in_container).
For every Python infrastructure team — and especially the
data-engineering / Airflow user community — this is what their
structural-validation surface looks like in the wild. alint replaces
~30 of those 109 hooks with one declarative file plus the bundled
compliance/apache-2@v1 ruleset. A single place to look for “what
does airflow consider a valid provider package” instead of chasing
eight pre-commit IDs into eight Python files.
This is the strongest piece of evidence for the Python-ecosystem launch positioning in the corpus.
Headline catch
The most alint-shaped surface in the entire airflow tree is
providers/: 101 provider distributions, each obeying the exact
pattern alint was built to express.
for_each_file: providers/**/provider.yaml plus a nested require:
block of file_exists / dir_exists + a couple of yaml_path_matches
rules covers what airflow today enforces with the 1085-line
scripts/in_container/run_provider_yaml_files_check.py — a script
that has to spin up a docker container, import every provider
package via Python, and walk ProvidersManager. Cold runtime: 30-60
seconds. alint’s equivalent: under 1 second on the same checkout.
The ~50× speedup on this subset is the headline perf number.
Where alint earns its keep here
- 101 provider directories all conforming to the same shape — a
manifest-style “every directory with
provider.yamlalso has README.rst, LICENSE, NOTICE, src/, tests/, docs/” convention that today is enforced by nothing structural. alint’sfor_each_fileis exactly the right answer. - Multiple
pyproject.tomldistributions — root + 4 core + 101 providers + N shared. Every one is treated as a “distribution” by Airflow’s release machinery and must satisfy the same gitignore/license/notice requirements. Exactly whatfor_each_file: "**/pyproject.toml"was built for. - ~30 hooks map cleanly to existing alint rules; another ~10
shell out via
command:for tools airflow already uses (ruff, yamllint, codespell, shellcheck, hadolint, markdownlint, zizmor, bandit). Net: ~40 of 109 hooks in one declarative file.
Future story angles
- Pair with kubernetes for the cross-language pitch: “Airflow’s
for_each_file: providers/**/provider.yamlwith nested file-existence requirements is the Python expression of Kubernetes’staging/src/k8s.io/*meta-files check. Same shape, same primitive, two ecosystems.” - Primary example on alint.org/examples/python and in HN/Reddit launch posts targeting Python infrastructure audiences (data engineering / Airflow user community in particular).
- Demand-saturation evidence — airflow is one of 10 demand sources
for
cross_file_value_equals(v0.10 ship-target), 4 forimport_gate(v0.10 ship-target), 7 forordered_block(v0.10 ship-target). The single most-load-bearing case study in P2a for pushing v0.10 candidates over the line.