Why this case study matters
apache/spark is the flagship “Maven-multi-module polyglot Apache TLP” story for the launch — the canonical big-data engine (~38k stars, used at every FAANG, every major bank, every cloud provider). Naming it as a target gives alint instant credibility with the JVM + data-engineering audience.
Spark is the canonical 4-language Apache TLP polyglot monorepo —
Scala (the core engine) + Java (the ASM-level extension points) +
Python (PySpark, packaged as three distinct PyPI distributions) +
R (SparkR, packaged for CRAN). Glued together by a Maven multi-module
build (49 pom.xml files, 48 declared <module> entries) and a
72-workflow GitHub Actions matrix.
Where apache/arrow has a parity-mandate shape (every type
implemented in every language), apache/spark has a
per-language-MODULE mandate shape: each language tier sits at a
different layer of the Spark architecture, with its own Maven module
set, its own packaging conventions, its own published-artifact
namespace, and its own canonical lint script under dev/lint-<lang>.
Headline catch
arrow + spark + airflow give us 3 Apache TLPs with a converging
structural-discipline pattern — promoting the proposed
apache/governance@v1 bundled ruleset from “v0.10+ idea” to “v0.10
ship-target”. 9 of 12 governance artefacts converge across all 3 TLPs
examined (with the 3 binary-distribution artefacts gated on whether
the TLP ships binary).
That’s a strong enough signal to ship apache/governance@v1 in v0.10
alongside the existing compliance/apache-2@v1. Adopters can then
drop down to a 3-line extends::
extends: - alint://bundled/oss-baseline@v1 - alint://bundled/compliance/apache-2@v1 - alint://bundled/apache/governance@v1 # ← v0.10 ship-target…and pick up the entire Apache-discipline check set without restating the 6 governance rules in every TLP-specific config.
Where alint earns its keep here
- No per-language linter sees the cross-language MODULE shape —
scalastyle only sees Scala, checkstyle only sees Java, ruff/black/mypy
only see Python, lintr only sees R. The invariants alint enforces
(Maven multi-module integrity, per-language tool-config presence,
the 3-PySpark-packaging-variants uniformity, R/pkg/DESCRIPTION +
NAMESPACE + cran-comments.md set, two-tier license discipline,
.asf.yamlASF-infra config, RAT exclude registry presence) are exactly the layer alint owns. apache/governance@v1confirmed v0.10 ship-target — spark is the headline driver, with arrow + airflow as confirming sources.- The Maven multi-module shape (49 pom.xml files, 48 declared
<module>entries) gives alint a structural primitive that pnpm + yarn + cargo workspaces also need but currently express via hard-codedfor_each_dirlists. Addingxml_path_*(now v0.10 ship-target via dotnet/runtime’s scale, alongside spark’s 49 pom.xml) andregistry_paths_resolvelands the workspace-registry consistency story in one shape across all build systems. - 3 PySpark packaging variants × 2 required files each = 6
file-existence assertions wrapped in a single
for_each_dirrule. No Python tool checks the layout because no Python tool sees the layout from above.
Future story angles
- Launch tile — position alongside kubernetes, airflow, microsoft/typescript, next.js, arrow with the angle: “apache/spark has 4 languages, 49 Maven modules, 3 PySpark packaging variants, 72 GHA workflows, and 6 per-language lint scripts under dev/. alint is the layer that sees the structural shape that no per-language linter does — and the Apache-governance bundle ships in v0.10.”
- Apache TLP narrative — pair with arrow + airflow for the “3
TLPs converge →
apache/governance@v1” story. Position as “drop one line in yourextends:and inherit the Apache TLP discipline that the ASF Incubator graduation process requires”. - JVM + data-engineering narrative — Spark adoption brings immediate credibility with JVM and data-engineering audiences who may not yet have seen alint.
- Demand-saturation evidence for v0.10 candidates — spark is a
source for
registry_paths_resolve(8 sources total),xml_path_*(2 sources, promoted by dotnet/runtime scale),generated_file_fresh(6 sources), andcross_language_registry_consistency(refinement of the v0.11+ family).