Why this case study matters
pytorch is a multi-language ML mega-monorepo (~80k+ files; C++/CUDA
core, Python frontend, Bazel + setup.py + CMake build, generated _C
stubs, JIT/FX graph machinery, distributed/cuda/xpu/mps/rocm backend
matrix). Its structural-validation surface is dominated by one
artefact: .lintrunner.toml — a 1876-line TOML manifest declaring
57 distinct linter “adapters” orchestrated by lintrunner,
pytorch’s bespoke per-file lint runner (Rust binary that spawns
Python adapter scripts).
lintrunner exists because at the time pytorch needed it, no existing
tool handled their orchestration needs (multi-language scopes,
init-then-lint two-phase adapters, S3-vendored binary fetch,
partial-file lint via @PATHSFILE fanout). Anyone running a custom
orchestrator (pytorch’s lintrunner; tensorflow’s
buildifier/yapf-driven CI; bazel’s own buildifier) is in the
same situation.
Headline catch
alint isn’t trying to replace lintrunner — but the structural
subset of .lintrunner.toml is exactly what alint specialises in.
Of the 57 lintrunner adapters:
- ~28 are pure structural/regex (24
grep_linter.pyshims + EXEC + NEWLINE + SPACES + TABS) — every one expressible as a 5-10 line alint rule today - ~21 are command shellouts to mature external tools (clang-
format, clang-tidy, ruff, codespell, pyfmt, mypy, pyrefly,
actionlint, shellcheck, cmake) — alint shells out the same way via
command:rules - ~8 are AST-aware/semantic/git-diff-aware (test_has_main libcst, scoped_library AST, set_linter, gb_registry, header_only, import_linter, pyproject_linter, no_workflows_on_fork, stable_shim_*) — these stay on lintrunner
So ~49 of 57 adapters (≈86 %) are within alint’s grammar today;
the AST-aware tail (~8/57 ≈ 14 %) stays on lintrunner. alint is
what you would have built instead of lintrunner if lintrunner had
existed and you’d realised you only needed 86 % of its expressivity.
Where alint earns its keep here
- Inventory legibility — A new pytorch contributor staring at
the structural-validation surface today has to read a 1876-line
.lintrunner.toml, 30 Python adapter files intools/linter/adapters/, the.editorconfig,.clang-format(3.4 KB),.clang-tidy(3 KB),.cmakelintrc,pyrefly.toml,mypy.ini(×2), andpytest.inito understand what rules apply where. The alint config is one file, declarative, with each rule’s scope, severity, and rationale visible in 5-10 lines. - Fail-fast latency — alint has zero adapter-spawn cost: it walks the tree once and runs every rule in parallel against the in-memory file bytes. lintrunner spawns one Python process per code per file batch. For the 28 structural-only adapters, alint should be 10-100× faster.
- Strongest cross-cutting v0.10 demand signal — pytorch’s
WORKFLOWSYNC adapter (cross-workflow
sync-tagconsistency across 144 workflow files) is the cleanest example of thecross_file_value_equalspattern, the highest-priority new primitive across P2a + P2b. pytorch’storch/header_only_apis.txtregistry is the cleanest example ofregistry_paths_resolve(a flat text file lists symbols, each must appear in a .cpp test file). pytorch is also one of the four sources confirmingimport_gate(k8s + airflow + golang/go + pytorch). - The orchestration-replacement narrative — for repos that already have a custom orchestrator, alint is the “don’t build your own orchestrator next time” pitch — adopt alint for the structural floor, keep the AST tail on whatever AST-aware tool you needed in the first place.
Future story angles
- Custom-orchestrator launch tile — pair pytorch with bazel’s
buildifierand tensorflow’s CI for the “structural subset of YOUR custom orchestration layer” story. cross_file_value_equalsv0.10 launch post — pytorch WORKFLOWSYNC is the canonical example of the most-demanded primitive in the entire inventory.registry_paths_resolvev0.10 launch post — pytorchheader_only_apis.txtis the cleanest pattern instance.- CLAUDE.md early-adopter narrative — pytorch was an early
adopter of agent-friendly documentation; the
.ci/docker/content-hash trap flagged in CLAUDE.md is a foot-gun pattern worth its own follow-up post.