Migrating custom bash validation scripts to alint

Many repos accumulate a hack/verify-*.sh (or scripts/check-*.py, or tools/lint-*.js) directory over time. The patterns are usually structural, the scripts are usually copy-paste, and a new contributor staring at 30+ files in hack/ has to read all of them to understand the structural contract. This isn't 100% replacement; it's consolidation of the declarative subset. alint takes the structural subset; AST-aware, runtime, generator, and cross-API-version checks stay where they are.

The evidence: three case studies

kubernetes/kubernetes maintains 50 hack/verify-*.sh scripts that gate every PR. After inventory: 17 move to alint (12 map directly to rule kinds like file_header, file_max_size, for_each_dir, pair, filename_regex; 5 more wrap existing tools (shellcheck, gofmt, golangci-lint) via the command rule kind). 7 stay because they need primitives alint deliberately doesn't ship (language-aware import gates, cross-API-version diffs, AST-aware metric-name regex). 18 stay as out-of-scope by design (codegen drift, OpenAPI generation, Go module-graph analysis, runtime binary symbol checks). 6+2 are duplicates / pre-existing. Net: a 50-script directory becomes 33 scripts + one .alint.yml.

cpython tells the same story at a different shape: 56 distinct structural-validation surfaces (a 122-target Makefile.pre.in, 35 pre-commit hooks, 9 ruff configs, 7 Tools/build/*.py checkers, .gitattributes, .editorconfig, 25 GH workflows). 38% map to alint; the rest are AST-aware, codegen, or binary-symbol checks. apache/airflow tells it from the pre-commit-driven Python ecosystem: 109 pre-commit hooks, ~40% map. The most alint-shaped surface (the 101-provider package layout contract) lives today in a 1085-line Python script that imports every provider; alint expresses the same contract in 25 lines of for_each_file + nested require:.

Pattern catalogue

Common bash patterns and their alint equivalents. Every rule kind named here exists today; verify your version supports the listed options (rule catalogue).

Common bash patternalint ruleNotes
grep -rn 'TODO\|XXX\|FIXME' src/ file_content_forbidden Direct map. Pair with git_blame_age for "TODOs older than 6 months".
find . -name '*.tmp' -or -name '*.swp' file_absent Direct map. agent-hygiene@v1 bundled ruleset already covers *.bak, *.orig, *~, *.swp.
find . -name '.DS_Store' -delete file_absent + fix: file_remove Auto-fix supported. hygiene/no-tracked-artifacts@v1 ships this.
for d in packages/*/; do test -f "$d/README.md" || exit 1; done for_each_dir + file_exists Direct map. monorepo@v1 ships this for packages/*, crates/*, apps/*, services/*.
for d in crates/*/; do grep -q 'license =' "$d/Cargo.toml" || exit 1; done for_each_dir + toml_path_matches Direct map. RFC 9535 JSONPath against TOML.
for f in $(find . -name '*.sh'); do [ -x "$f" ] || exit 1; done shebang_has_executable / executable_has_shebang Direct map. Pair both for the bidirectional invariant.
grep -rE '[ \t]+$' --include='*.rs' src/ no_trailing_whitespace Direct map. Auto-fixable via file_trim_trailing_whitespace.
find . -name '*.md' | xargs file | grep -v 'CRLF' line_endings (target: lf) Direct map. Auto-fixable via file_normalize_line_endings.
git diff --check (in CI) no_merge_conflict_markers Direct map. Already in oss-baseline@v1.
for f in test/parallel/*.js; do [[ "$f" =~ ^test-.*\.js$ ]] || exit 1; done filename_regex Direct map. (Actual nodejs/node convention; silent test discovery on misnamed files.)
jq -e '.license == "MIT"' package.json json_path_equals Direct map. Use *_matches for regex instead of equality.
python -c "import json; json.load(open('config.json'))" json_schema_passes Parse-error becomes a single violation per file rather than a 1-line pass/fail.
for f in .github/workflows/*.yml; do yq '.permissions.contents' "$f"; done | grep -qv read yaml_path_equals (or gha-workflows-have-permissions) Direct map. ci/github-actions@v1 bundled ruleset already encodes this.
grep -E '^pin: [a-f0-9]{40}' .github/workflows/*.yml yaml_path_matches (with SHA-pin regex) Direct map. Bundled gha-workflows-pin-actions.
head -1 *.rs | grep -q 'SPDX-License-Identifier' file_header Direct map. Add a fix block (file_prepend) to auto-insert.
du -k . | awk '$1 > 5120 {print $2}' (no file > 5 MB) file_max_size Direct map. The kubernetes config uses this with paths.exclude for vendor + testdata.
find packages/ -name package.json -exec ... dir_contains / for_each_dir Direct map. dir_contains is sugar for the common shape.
find . -name README.md | awk -F/ '{print $NF}' | sort | uniq -d unique_by (key: "{stem}") Direct map.

Patterns with no clean alint mapping

These are the kinds of script you keep. alint deliberately doesn't try to do them; many are on the v0.10+ candidate list.

Common bash patternWhy alint can't / won't
kubectl version --client | grep '^Client Version: v1.2' Runtime probe. alint is a static checker. Use command: if you want it inside the same gate, but be aware it's a static-tool wrapping a runtime probe.
Cross-API-version diffs (kubernetes' verify-types-aliases.sh) Requires parsed Go code + cross-version semantic comparison. AST territory; out of scope.
kustomize build ... | diff - (codegen freshness) Running the generator + diffing the output is on the v0.10+ candidate list (generated_file_fresh); not in v0.9. Keep your script.
grep -E 'import "github.com/foo/bar"' --include='*.go' some/path/ This is the import_gate rule kind on the v0.10+ list. Today: keep your script, or wrap it in command:.
Most git log / git blame analysis (rebase-graph checks, contributor-stat gates) alint has narrow git hygiene support (git_no_denied_paths, git_commit_message, git_blame_age); broader history-aware checks are out of scope.
Vendor-tree readonly enforcement (kubernetes' verify-readonly-packages.sh) Requires hashing each vendored entry against a pinned manifest. pair_hash is on the v0.10+ list; today file_hash covers single files only.

Side-by-side: a real script

Every package has a README

The for d in packages/*/; do …; done shape is one of the most common in OSS monorepos. The bash version:

#!/usr/bin/env bash
set -euo pipefail
fail=0
for d in packages/*/; do
  if [ ! -f "$d/README.md" ]; then
    echo "Missing: $d/README.md" >&2
    fail=1
  fi
done
exit $fail

The alint equivalent (five lines of YAML) supports the same invariant, but also handles ignore filtering, parallel evaluation, JSON / SARIF / GitHub-annotation output formats, and --changed-mode incremental runs:

- id: every-package-has-readme
  kind: for_each_dir
  select: "packages/*"
  require:
    - kind: file_exists
      paths: "{path}/README.md"
  level: error

If your monorepo lives under packages/ AND apps/, the monorepo@v1 bundled ruleset already covers this with one extends: line.

Every Cargo.toml declares a license

Slightly more complex. The bash version typically has two subtle bugs: it doesn't catch license-file = (the alternate form), and it doesn't catch a TOML-comment-shaped license line that grep matches but TOML parsers reject. The alint version is TOML-aware:

- id: every-crate-declares-license
  kind: for_each_dir
  select: "crates/*"
  when_iter: 'iter.has_file("Cargo.toml")'
  require:
    - kind: toml_path_matches
      paths: "{path}/Cargo.toml"
      path: "$.package.license"
      matches: '^[A-Za-z0-9.+\-]+( OR [A-Za-z0-9.+\-]+)*$'
  level: error

when_iter: filters out non-package directories under crates/ without any extra grep. The TOML path query handles the parsing; the regex pins the license to a SPDX-shaped value. The replacement isn't shorter; it's stricter.

Composability for the long tail

For checks that don't map (an in-house tool, an AST-aware Python script, a runtime probe), command: keeps them inside the same gate:

- id: shellcheck-all-scripts
  kind: command
  paths: "**/*.sh"
  command: ["shellcheck", "-x", "{path}"]
  timeout: 30
  level: error

- id: keep-our-ast-checker
  kind: command
  paths: "src/**/*.py"
  command: ["scripts/check-templated-fields.py", "{path}"]
  level: warning

Now alint check runs both the declarative rules and the surviving scripts; CI calls alint check once instead of running bash hack/verify-all.sh.

The command: trust gate

A few notes on the survivor-wrapping pattern:

  • Trust gate. command: rules are only allowed in the user's own top-level config. A kind: command rule introduced via extends: (local file, HTTPS URL, or alint://bundled/) is rejected at load time. Adopting a published ruleset never grants arbitrary process execution.
  • --changed interaction. command is a per-file rule, so under alint check --changed it spawns only for files in the diff. Your expensive shell-out is automatically incremental in CI.
  • Environment threaded into the child. ALINT_PATH, ALINT_ROOT, ALINT_RULE_ID, ALINT_LEVEL, plus any top-level vars: you define. Existing scripts usually need no changes; most read $1 or $ALINT_PATH directly.

Step-by-step migration

A real-world plan, scoped to roughly one afternoon for a 30-script directory.

  1. Inventory and categorise. For each script, ask: is this structural (file existence, content patterns, manifest fields, filename grammar)? → maps to alint. AST-aware? → keep, optionally wrap in command:. Runtime (binary version, network probe)? → keep, optionally wrap. Generator (update-*, regen-*)? → not alint's job; keep. Cross-API-version diff or module-graph analysis? → out of scope; keep. The kubernetes case study tags each of its 50 scripts; copy that template.
  2. Write .alint.yml with bundled rulesets first. Start with what already encodes 80% of what most projects want:
    version: 1
    
    extends:
      - alint://bundled/oss-baseline@v1
      - alint://bundled/<your-language>@v1
      - alint://bundled/ci/github-actions@v1
      - alint://bundled/hygiene/no-tracked-artifacts@v1
    
    rules:
      # your project-specific rules go here
    Run alint list --config .alint.yml to see the rule expansion. Many of the scripts you'd planned to translate will already be covered.
  3. Translate the structural subset. Walk your inventory's "structural" pile, one script at a time, and add a rule per script. Keep IDs human (team-go-license-header, team-no-todo-in-prod); they show up in violation messages and CI logs. Reuse the pattern catalogue above.
  4. Reconcile with current main. Run alint check against the current tip. You will surface either: a handful of true positives (your script's tolerances were slightly off; alint catches what it silently let through); a flood of false positives (your paths: scope is too broad, narrow it); or existing live failures (some bash scripts are silently broken: they ran, exited 0, but their grep was hitting nothing).
  5. Wrap surviving scripts in command: rules. For the AST / runtime / cross-API-version / generator scripts you deliberately kept, add one command: rule each. Now both the declarative and survivor checks run under alint check.
  6. Update CI. Replace your master bash hack/verify-all.sh invocation with alint check. Drop the per-script CI steps; one binary, one exit code, one output file.
  7. (Optional) retire the surviving scripts organically. Surviving scripts that become candidates for new alint primitives (registry_paths_resolve, import_gate, generated_file_fresh, cross_file_value_equals, all on the v0.10+ list) will retire themselves over the next few releases.

What you gain by consolidating

  • One config to read. The structural contract becomes a single artifact, not a directory.
  • Faster CI. alint runs declarative rules in parallel; on a 100k-file repo (the kubernetes shape), the alint-replaceable subset benchmarks at ~1-2 seconds; the equivalent shell pipeline measures in tens of seconds dominated by N redundant walks.
  • Output formats CI can read. human, json, sarif, github, markdown, junit, gitlab, agent. Eight formats, no echo glue.
  • Auto-fix where it's safe. 12 mechanically-safe ops (trim whitespace, normalize line endings, strip BOM/bidi/zero-width, prepend / append, rename, remove).
  • A declarative DSL anyone can edit. Adding a rule is a 5-line YAML block; editing one is changing a regex.
  • Composition. extends: pulls bundled rulesets, local files, or HTTPS+SRI URLs.

See also

Start with the bundled rulesets, then translate the structural subset script by script. Most repos see real consolidation in one afternoon.