Why this case study matters
apache/arrow is the flagship “language-agnostic polyglot monorepo”
pitch for alint — a single tree historically hosting columnar-format
reference implementations across a dozen languages, glued together by
the cross-language format/ schema spec (FlatBuffers + Protobuf) and
the dev/archery/ integration-test harness.
apache/arrow is the canonical multi-language project on GitHub (~14k stars, widely deployed, the basis for every modern columnar data library: Pandas 2.0, Polars, DuckDB, Snowflake’s external table format). Naming it as a target gives alint instant credibility with the data-engineering audience.
A single declarative config replaces the cross-language structural conventions that no per-language linter sees — clang-format only sees C++, rubocop only sees Ruby, flake8 only sees Python, lintr only sees R. The invariants alint enforces (per-language subdir README, per-Ruby-gem layout, per-GLib-sublibrary meson.build, format-spec file integrity, Apache governance triad, per-language tool-config presence) are exactly the layer alint owns and nothing else does.
Headline catch
6 languages in one tree, 21 lint hooks across 14 tool repos, and 0 tools that see the cross-language conventions — alint is the layer that does.
The pitch lands harder when paired with the per-Ruby-gem finding: 8
gem subdirs × 6 required files each = 48 file-existence assertions
wrapped in a single for_each_file rule. No Ruby tool checks the
layout because no Ruby tool sees the layout from above.
The Apache header rule’s 16 findings against the live tree are ALL
legitimate (LLVM-licensed third-party files + auto-generated
bindings, all listed in dev/release/rat_exclude_files.txt) —
confirming both that the rule fires correctly AND that the v0.10
ship-target registry_paths_resolve rule kind would close the loop.
Where alint earns its keep here
- The cross-language schema spec under
format/— a single directory of FlatBuffers + Protobuf files that EVERY language implementation must conform to. No per-language linter sees this — it’s the cross-language contract. alint covers it as a single multi-pathfile_existsrule. - The Apache compliance bundle (
compliance/apache-2@v1) is alint’s tightest fit anywhere in the case-study catalogue. Applying it to apache/arrow surfaces the longer-vs-shorter header form distinction, which the configured override resolves cleanly. - The Apache release tooling (
dev/release/,.pre-commit-config.yaml’s rat hook,verify-release-candidate.sh) is unique among OSS repos: the ASF community ships ~250+ projects following this exact discipline, and NONE of them have a structural linter that enforces the dance. This case study is a ready-made template for every Apache project. dev/release/rat_exclude_files.txt= 102 path patterns consumed by Apache RAT. arrow is one of 8 demand sources for the v0.10 ship-targetregistry_paths_resolveprimitive — the highest-leverage gap in P2a.
Future story angles
- Launch tile — position as the polyglot-monorepo flagship alongside kubernetes, airflow, microsoft/typescript, next.js. Tagline: “apache/arrow has 6 languages in one tree, 21 lint hooks across 14 tool repos, and 0 tools that see the cross-language conventions — alint is the layer that does.”
- Apache TLP narrative — arrow + spark + airflow give us 3
Apache TLPs with a converging structural-discipline pattern
(LICENSE / NOTICE / KEYS / RAT exclude registry /
dev/create-release// two-tier license discipline /.asf.yaml/ per-language lint-script orchestration). This crystallises the proposedapache/governance@v1bundled ruleset as a v0.10 ship-target. The ~12-rule bundle would land in every Apache TLPextends:list. - Polyglot pitch — pair with the per-Ruby-gem and per-GLib-sublibrary findings to demonstrate “alint sees the polyglot tree from above in a way no per-language linter can”.
cross_language_implementation_completedemand-saturation — arrow stress-tests this primitive harder than any other repo in P2a; one of 5 saturated sources (with TF, protobuf, angular, flutter).