Migrating custom bash validation scripts to alint
Many repos accumulate a hack/verify-*.sh (or
scripts/check-*.py, or tools/lint-*.js)
directory over time. The patterns are usually structural, the scripts
are usually copy-paste, and a new contributor staring at 30+ files in
hack/ has to read all of them to understand the
structural contract. This isn't 100% replacement; it's
consolidation of the declarative subset. alint takes the
structural subset; AST-aware, runtime, generator, and
cross-API-version checks stay where they are.
The evidence: three case studies
kubernetes/kubernetes
maintains 50 hack/verify-*.sh scripts
that gate every PR. After inventory: 17 move to alint
(12 map directly to rule kinds like file_header,
file_max_size, for_each_dir,
pair, filename_regex; 5 more wrap existing
tools (shellcheck, gofmt,
golangci-lint) via the command rule kind).
7 stay because they need primitives alint deliberately doesn't ship
(language-aware import gates, cross-API-version diffs, AST-aware
metric-name regex). 18 stay as out-of-scope by design (codegen drift,
OpenAPI generation, Go module-graph analysis, runtime binary symbol
checks). 6+2 are duplicates / pre-existing. Net: a 50-script
directory becomes 33 scripts + one .alint.yml.
cpython tells the same story
at a different shape: 56 distinct structural-validation surfaces (a
122-target Makefile.pre.in, 35 pre-commit hooks, 9 ruff
configs, 7 Tools/build/*.py checkers,
.gitattributes, .editorconfig, 25 GH
workflows). 38% map to alint; the rest are AST-aware, codegen, or
binary-symbol checks.
apache/airflow tells it from
the pre-commit-driven Python ecosystem: 109 pre-commit hooks, ~40%
map. The most alint-shaped surface (the 101-provider package layout
contract) lives today in a 1085-line Python script that imports every
provider; alint expresses the same contract in 25 lines of
for_each_file + nested require:.
Pattern catalogue
Common bash patterns and their alint equivalents. Every rule kind named here exists today; verify your version supports the listed options (rule catalogue).
| Common bash pattern | alint rule | Notes |
|---|---|---|
grep -rn 'TODO\|XXX\|FIXME' src/ | file_content_forbidden | Direct map. Pair with git_blame_age for "TODOs older than 6 months". |
find . -name '*.tmp' -or -name '*.swp' | file_absent | Direct map. agent-hygiene@v1 bundled ruleset already covers *.bak, *.orig, *~, *.swp. |
find . -name '.DS_Store' -delete | file_absent + fix: file_remove | Auto-fix supported. hygiene/no-tracked-artifacts@v1 ships this. |
for d in packages/*/; do test -f "$d/README.md" || exit 1; done | for_each_dir + file_exists | Direct map. monorepo@v1 ships this for packages/*, crates/*, apps/*, services/*. |
for d in crates/*/; do grep -q 'license =' "$d/Cargo.toml" || exit 1; done | for_each_dir + toml_path_matches | Direct map. RFC 9535 JSONPath against TOML. |
for f in $(find . -name '*.sh'); do [ -x "$f" ] || exit 1; done | shebang_has_executable / executable_has_shebang | Direct map. Pair both for the bidirectional invariant. |
grep -rE '[ \t]+$' --include='*.rs' src/ | no_trailing_whitespace | Direct map. Auto-fixable via file_trim_trailing_whitespace. |
find . -name '*.md' | xargs file | grep -v 'CRLF' | line_endings (target: lf) | Direct map. Auto-fixable via file_normalize_line_endings. |
git diff --check (in CI) | no_merge_conflict_markers | Direct map. Already in oss-baseline@v1. |
for f in test/parallel/*.js; do [[ "$f" =~ ^test-.*\.js$ ]] || exit 1; done | filename_regex | Direct map. (Actual nodejs/node convention; silent test discovery on misnamed files.) |
jq -e '.license == "MIT"' package.json | json_path_equals | Direct map. Use *_matches for regex instead of equality. |
python -c "import json; json.load(open('config.json'))" | json_schema_passes | Parse-error becomes a single violation per file rather than a 1-line pass/fail. |
for f in .github/workflows/*.yml; do yq '.permissions.contents' "$f"; done | grep -qv read | yaml_path_equals (or gha-workflows-have-permissions) | Direct map. ci/github-actions@v1 bundled ruleset already encodes this. |
grep -E '^pin: [a-f0-9]{40}' .github/workflows/*.yml | yaml_path_matches (with SHA-pin regex) | Direct map. Bundled gha-workflows-pin-actions. |
head -1 *.rs | grep -q 'SPDX-License-Identifier' | file_header | Direct map. Add a fix block (file_prepend) to auto-insert. |
du -k . | awk '$1 > 5120 {print $2}' (no file > 5 MB) | file_max_size | Direct map. The kubernetes config uses this with paths.exclude for vendor + testdata. |
find packages/ -name package.json -exec ... | dir_contains / for_each_dir | Direct map. dir_contains is sugar for the common shape. |
find . -name README.md | awk -F/ '{print $NF}' | sort | uniq -d | unique_by (key: "{stem}") | Direct map. |
Patterns with no clean alint mapping
These are the kinds of script you keep. alint deliberately doesn't
try to do them; many are on the v0.10+ candidate list.
Common bash pattern Why alint can't / won't kubectl version --client | grep '^Client Version: v1.2' Runtime probe. alint is a static checker. Use command: if you want it inside the same gate, but be aware it's a static-tool wrapping a runtime probe. Cross-API-version diffs (kubernetes' verify-types-aliases.sh) Requires parsed Go code + cross-version semantic comparison. AST territory; out of scope. kustomize build ... | diff - (codegen freshness) Running the generator + diffing the output is on the v0.10+ candidate list (generated_file_fresh); not in v0.9. Keep your script. grep -E 'import "github.com/foo/bar"' --include='*.go' some/path/ This is the import_gate rule kind on the v0.10+ list. Today: keep your script, or wrap it in command:. Most git log / git blame analysis (rebase-graph checks, contributor-stat gates) alint has narrow git hygiene support (git_no_denied_paths, git_commit_message, git_blame_age); broader history-aware checks are out of scope. Vendor-tree readonly enforcement (kubernetes' verify-readonly-packages.sh) Requires hashing each vendored entry against a pinned manifest. pair_hash is on the v0.10+ list; today file_hash covers single files only.
Side-by-side: a real script
Every package has a README
The for d in packages/*/; do …; done shape is one of the
most common in OSS monorepos. The bash version:
#!/usr/bin/env bash
set -euo pipefail
fail=0
for d in packages/*/; do
if [ ! -f "$d/README.md" ]; then
echo "Missing: $d/README.md" >&2
fail=1
fi
done
exit $fail
The alint equivalent (five lines of YAML) supports the same
invariant, but also handles ignore filtering, parallel evaluation,
JSON / SARIF / GitHub-annotation output formats, and
--changed-mode incremental runs:
- id: every-package-has-readme
kind: for_each_dir
select: "packages/*"
require:
- kind: file_exists
paths: "{path}/README.md"
level: error
If your monorepo lives under packages/ AND
apps/, the monorepo@v1 bundled ruleset
already covers this with one extends: line.
Every Cargo.toml declares a license
Slightly more complex. The bash version typically has two subtle
bugs: it doesn't catch license-file = (the alternate
form), and it doesn't catch a TOML-comment-shaped license line that
grep matches but TOML parsers reject. The alint version is TOML-aware:
- id: every-crate-declares-license
kind: for_each_dir
select: "crates/*"
when_iter: 'iter.has_file("Cargo.toml")'
require:
- kind: toml_path_matches
paths: "{path}/Cargo.toml"
path: "$.package.license"
matches: '^[A-Za-z0-9.+\-]+( OR [A-Za-z0-9.+\-]+)*$'
level: error
when_iter: filters out non-package directories under
crates/ without any extra grep. The TOML path query
handles the parsing; the regex pins the license to a SPDX-shaped
value. The replacement isn't shorter; it's stricter.
Composability for the long tail
For checks that don't map (an in-house tool, an AST-aware Python
script, a runtime probe), command: keeps them inside
the same gate:
- id: shellcheck-all-scripts
kind: command
paths: "**/*.sh"
command: ["shellcheck", "-x", "{path}"]
timeout: 30
level: error
- id: keep-our-ast-checker
kind: command
paths: "src/**/*.py"
command: ["scripts/check-templated-fields.py", "{path}"]
level: warning
Now alint check runs both the declarative rules and the
surviving scripts; CI calls alint check once instead of
running bash hack/verify-all.sh.
The command: trust gate
A few notes on the survivor-wrapping pattern:
- Trust gate.
command: rules are only
allowed in the user's own top-level config. A
kind: command rule introduced via extends:
(local file, HTTPS URL, or alint://bundled/) is
rejected at load time. Adopting a published ruleset never grants
arbitrary process execution.
-
--changed interaction. command is a per-file rule, so under alint check
--changed it spawns only for files in the diff. Your
expensive shell-out is automatically incremental in CI.
- Environment threaded into the child.
ALINT_PATH, ALINT_ROOT,
ALINT_RULE_ID, ALINT_LEVEL, plus any
top-level vars: you define. Existing scripts usually
need no changes; most read $1 or
$ALINT_PATH directly.
Step-by-step migration
A real-world plan, scoped to roughly one afternoon for a 30-script
directory.
- Inventory and categorise. For each script, ask: is
this structural (file existence, content patterns,
manifest fields, filename grammar)? → maps to alint.
AST-aware? → keep, optionally wrap in
command:.
Runtime (binary version, network probe)? → keep,
optionally wrap. Generator (update-*,
regen-*)? → not alint's job; keep.
Cross-API-version diff or module-graph analysis?
→ out of scope; keep. The kubernetes case study tags each of its 50
scripts; copy that template.
- Write
.alint.yml with bundled rulesets first.
Start with what already encodes 80% of what most projects want:
version: 1
extends:
- alint://bundled/oss-baseline@v1
- alint://bundled/<your-language>@v1
- alint://bundled/ci/github-actions@v1
- alint://bundled/hygiene/no-tracked-artifacts@v1
rules:
# your project-specific rules go here
Run alint list --config .alint.yml to see the rule
expansion. Many of the scripts you'd planned to translate will
already be covered.
- Translate the structural subset. Walk your
inventory's "structural" pile, one script at a time, and add a rule
per script. Keep IDs human (
team-go-license-header,
team-no-todo-in-prod); they show up in violation
messages and CI logs. Reuse the pattern catalogue above.
- Reconcile with current
main. Run
alint check against the current tip. You will surface
either: a handful of true positives (your script's tolerances were
slightly off; alint catches what it silently let through); a flood
of false positives (your paths: scope is too broad,
narrow it); or existing live failures (some bash scripts are
silently broken: they ran, exited 0, but their grep was hitting
nothing).
- Wrap surviving scripts in
command: rules.
For the AST / runtime / cross-API-version / generator scripts you
deliberately kept, add one command: rule each. Now
both the declarative and survivor checks run under
alint check.
- Update CI. Replace your master
bash hack/verify-all.sh invocation with
alint check. Drop the per-script CI steps; one binary,
one exit code, one output file.
- (Optional) retire the surviving scripts organically.
Surviving scripts that become candidates for new alint primitives
(
registry_paths_resolve, import_gate,
generated_file_fresh,
cross_file_value_equals, all on the v0.10+ list) will
retire themselves over the next few releases.
What you gain by consolidating
- One config to read. The structural contract becomes
a single artifact, not a directory.
- Faster CI. alint runs declarative rules in
parallel; on a 100k-file repo (the kubernetes shape), the
alint-replaceable subset benchmarks at ~1-2 seconds; the equivalent
shell pipeline measures in tens of seconds dominated by N redundant
walks.
- Output formats CI can read.
human,
json, sarif, github,
markdown, junit, gitlab,
agent. Eight formats, no echo glue.
- Auto-fix where it's safe. 12 mechanically-safe ops
(trim whitespace, normalize line endings, strip BOM/bidi/zero-width,
prepend / append, rename, remove).
- A declarative DSL anyone can edit. Adding a rule
is a 5-line YAML block; editing one is changing a regex.
- Composition.
extends: pulls bundled
rulesets, local files, or HTTPS+SRI URLs.
See also
- kubernetes case study: the headline 50-to-17 story, with the full inventory.
- cpython case study: 56 surfaces across Make targets / pre-commit /
Tools/build/* / .gitattributes. - airflow case study: 109 pre-commit hooks, ~40% map.
- Rule catalogue: every alint rule kind, with examples.
- Bundled rulesets: what
extends: ships out of the box. - How alint compares to other tools, including Repolinter, ls-lint, Megalinter, EditorConfig.
Start with the bundled rulesets, then translate the structural subset
script by script. Most repos see real consolidation in one afternoon.