Why this case study matters
pytorch is a multi-language ML mega-monorepo (~80k+ files; C++/CUDA
core, Python frontend, Bazel + setup.py + CMake build, generated _C
stubs, JIT/FX graph machinery, distributed/cuda/xpu/mps/rocm backend
matrix). Its structural-validation surface is dominated by one
artefact: .lintrunner.toml, a 1876-line TOML manifest declaring
57 distinct linter “adapters” orchestrated by lintrunner,
pytorch’s bespoke per-file lint runner (Rust binary that spawns
Python adapter scripts).
lintrunner exists because at the time pytorch needed it, no existing
tool handled their orchestration needs (multi-language scopes,
init-then-lint two-phase adapters, S3-vendored binary fetch,
partial-file lint via @PATHSFILE fanout). Anyone running a custom
orchestrator (pytorch’s lintrunner; tensorflow’s
buildifier/yapf-driven CI; bazel’s own buildifier) is in the
same situation.
Headline catch
alint isn’t trying to replace lintrunner, but the structural
subset of .lintrunner.toml is exactly what alint specialises in.
Of the 57 lintrunner adapters:
- ~28 are pure structural/regex (24
grep_linter.pyshims + EXEC + NEWLINE + SPACES + TABS), every one expressible as a 5-10 line alint rule today - ~21 are command shellouts to mature external tools (clang-
format, clang-tidy, ruff, codespell, pyfmt, mypy, pyrefly,
actionlint, shellcheck, cmake), alint shells out the same way via
command:rules - ~8 are AST-aware/semantic/git-diff-aware (test_has_main libcst, scoped_library AST, set_linter, gb_registry, header_only, import_linter, pyproject_linter, no_workflows_on_fork, stable_shim_*), these stay on lintrunner
So ~49 of 57 adapters (≈86 %) are within alint’s grammar today;
the AST-aware tail (~8/57 ≈ 14 %) stays on lintrunner. alint is
what you would have built instead of lintrunner if lintrunner had
existed and you’d realised you only needed 86 % of its expressivity.
Where alint earns its keep here
- Inventory legibility: A new pytorch contributor staring at
the structural-validation surface today has to read a 1876-line
.lintrunner.toml, 30 Python adapter files intools/linter/adapters/, the.editorconfig,.clang-format(3.4 KB),.clang-tidy(3 KB),.cmakelintrc,pyrefly.toml,mypy.ini(×2), andpytest.inito understand what rules apply where. The alint config is one file, declarative, with each rule’s scope, severity, and rationale visible in 5-10 lines. - Fail-fast latency: alint has zero adapter-spawn cost: it walks the tree once and runs every rule in parallel against the in-memory file bytes. lintrunner spawns one Python process per code per file batch. For the 28 structural-only adapters, alint should be 10-100× faster.
- Strongest cross-cutting v0.10 demand signal: pytorch’s
WORKFLOWSYNC adapter (cross-workflow
sync-tagconsistency across 144 workflow files) is the cleanest example of thecross_file_value_equalspattern, the highest-priority new primitive across P2a + P2b. pytorch’storch/header_only_apis.txtregistry is the cleanest example ofregistry_paths_resolve(a flat text file lists symbols, each must appear in a .cpp test file). pytorch is also one of the four sources confirmingimport_gate(k8s + airflow + golang/go + pytorch). - The orchestration-replacement narrative: for repos that already have a custom orchestrator, alint is the “don’t build your own orchestrator next time” pitch, adopt alint for the structural floor, keep the AST tail on whatever AST-aware tool you needed in the first place.
Future story angles
- Custom-orchestrator launch tile: pair pytorch with bazel’s
buildifierand tensorflow’s CI for the “structural subset of YOUR custom orchestration layer” story. cross_file_value_equalsv0.10 launch post: pytorch WORKFLOWSYNC is the canonical example of the most-demanded primitive in the entire inventory.registry_paths_resolvev0.10 launch post: pytorchheader_only_apis.txtis the cleanest pattern instance.- CLAUDE.md early-adopter narrative: pytorch was an early
adopter of agent-friendly documentation; the
.ci/docker/content-hash trap flagged in CLAUDE.md is a foot-gun pattern worth its own follow-up post.