apache/spark

4 languages, 49 Maven modules, 3 PySpark packaging variants, 72 GHA workflows, and 6 per-language lint scripts — alint sees the structural shape no per-language linter does.

Narrative
Polyglot wins
Rules
110
Last revalidated
Engineering reference
README on GitHub · .alint.yml

Why this case study matters

apache/spark is the flagship “Maven-multi-module polyglot Apache TLP” story for the launch — the canonical big-data engine (~38k stars, used at every FAANG, every major bank, every cloud provider). Naming it as a target gives alint instant credibility with the JVM + data-engineering audience.

Spark is the canonical 4-language Apache TLP polyglot monorepo — Scala (the core engine) + Java (the ASM-level extension points) + Python (PySpark, packaged as three distinct PyPI distributions) + R (SparkR, packaged for CRAN). Glued together by a Maven multi-module build (49 pom.xml files, 48 declared <module> entries) and a 72-workflow GitHub Actions matrix.

Where apache/arrow has a parity-mandate shape (every type implemented in every language), apache/spark has a per-language-MODULE mandate shape: each language tier sits at a different layer of the Spark architecture, with its own Maven module set, its own packaging conventions, its own published-artifact namespace, and its own canonical lint script under dev/lint-<lang>.

Headline catch

arrow + spark + airflow give us 3 Apache TLPs with a converging structural-discipline pattern — promoting the proposed apache/governance@v1 bundled ruleset from “v0.10+ idea” to “v0.10 ship-target”. 9 of 12 governance artefacts converge across all 3 TLPs examined (with the 3 binary-distribution artefacts gated on whether the TLP ships binary).

That’s a strong enough signal to ship apache/governance@v1 in v0.10 alongside the existing compliance/apache-2@v1. Adopters can then drop down to a 3-line extends::

extends:
- alint://bundled/oss-baseline@v1
- alint://bundled/compliance/apache-2@v1
- alint://bundled/apache/governance@v1 # ← v0.10 ship-target

…and pick up the entire Apache-discipline check set without restating the 6 governance rules in every TLP-specific config.

Where alint earns its keep here

Future story angles

The factual engineering writeup (tooling inventory, mapping table, gap catalogue, validation status footer) lives in the public alint repo at github.com/asamarts/alint/tree/main/examples/apache-spark/README.md.