Migrating custom bash validation scripts to alint

Many repos accumulate a hack/verify-*.sh (or scripts/check-*.py, or tools/lint-*.js) directory over time. The patterns are usually structural, the scripts are usually copy-paste, and a new contributor staring at 30+ files in hack/ has to read all of them to understand the structural contract. This isn't 100% replacement; it's consolidation of the declarative subset. alint takes the structural subset; AST-aware, runtime, generator, and cross-API-version checks stay where they are.

The evidence: three case studies

kubernetes/kubernetes maintains 50 hack/verify-*.sh scripts that gate every PR. After inventory: 17 move to alint (12 map directly to rule kinds like file_header, file_max_size, for_each_dir, pair, filename_regex; 5 more wrap existing tools (shellcheck, gofmt, golangci-lint) via the command rule kind). 7 stay because they need primitives alint deliberately doesn't ship (language-aware import gates, cross-API-version diffs, AST-aware metric-name regex). 18 stay as out-of-scope by design (codegen drift, OpenAPI generation, Go module-graph analysis, runtime binary symbol checks). 6+2 are duplicates / pre-existing. Net: a 50-script directory becomes 33 scripts + one .alint.yml.

cpython tells the same story at a different shape: 56 distinct structural-validation surfaces (a 122-target Makefile.pre.in, 35 pre-commit hooks, 9 ruff configs, 7 Tools/build/*.py checkers, .gitattributes, .editorconfig, 25 GH workflows). 38% map to alint; the rest are AST-aware, codegen, or binary-symbol checks. apache/airflow tells it from the pre-commit-driven Python ecosystem: 109 pre-commit hooks, ~40% map. The most alint-shaped surface (the 101-provider package layout contract) lives today in a 1085-line Python script that imports every provider; alint expresses the same contract in 25 lines of for_each_file + nested require:.

Pattern catalogue

Common bash patterns and their alint equivalents. Every rule kind named here exists today; verify your version supports the listed options (rule catalogue).

Common bash pattern	alint rule	Notes
`grep -rn 'TODO\\|XXX\\|FIXME' src/`	`file_content_forbidden`	Direct map. Pair with `git_blame_age` for "TODOs older than 6 months".
`find . -name '.tmp' -or -name '.swp'`	`file_absent`	Direct map. `agent-hygiene@v1` bundled ruleset already covers `.bak`, `.orig`, `~`, `.swp`.
`find . -name '.DS_Store' -delete`	`file_absent` + `fix: file_remove`	Auto-fix supported. `hygiene/no-tracked-artifacts@v1` ships this.
`for d in packages/*/; do test -f "$d/README.md" \|\| exit 1; done`	`for_each_dir` + `file_exists`	Direct map. `monorepo@v1` ships this for `packages/`, `crates/`, `apps/`, `services/`.
`for d in crates/*/; do grep -q 'license =' "$d/Cargo.toml" \|\| exit 1; done`	`for_each_dir` + `toml_path_matches`	Direct map. RFC 9535 JSONPath against TOML.
`for f in $(find . -name '*.sh'); do [ -x "$f" ] \|\| exit 1; done`	`shebang_has_executable` / `executable_has_shebang`	Direct map. Pair both for the bidirectional invariant.
`grep -rE '[ \t]+$' --include='*.rs' src/`	`no_trailing_whitespace`	Direct map. Auto-fixable via `file_trim_trailing_whitespace`.
`find . -name '*.md' \| xargs file \| grep -v 'CRLF'`	`line_endings` (`target: lf`)	Direct map. Auto-fixable via `file_normalize_line_endings`.
`git diff --check` (in CI)	`no_merge_conflict_markers`	Direct map. Already in `oss-baseline@v1`.
`for f in test/parallel/.js; do [[ "$f" =~ ^test-.\.js$ ]] \|\| exit 1; done`	`filename_regex`	Direct map. (Actual nodejs/node convention; silent test discovery on misnamed files.)
`jq -e '.license == "MIT"' package.json`	`json_path_equals`	Direct map. Use `*_matches` for regex instead of equality.
`python -c "import json; json.load(open('config.json'))"`	`json_schema_passes`	Parse-error becomes a single violation per file rather than a 1-line pass/fail.
`for f in .github/workflows/*.yml; do yq '.permissions.contents' "$f"; done \| grep -qv read`	`yaml_path_equals` (or `gha-workflows-have-permissions`)	Direct map. `ci/github-actions@v1` bundled ruleset already encodes this.
`grep -E '^pin: [a-f0-9]{40}' .github/workflows/*.yml`	`yaml_path_matches` (with SHA-pin regex)	Direct map. Bundled `gha-workflows-pin-actions`.
`head -1 *.rs \| grep -q 'SPDX-License-Identifier'`	`file_header`	Direct map. Add a fix block (`file_prepend`) to auto-insert.
`du -k . \| awk '$1 > 5120 {print $2}'` (no file > 5 MB)	`file_max_size`	Direct map. The kubernetes config uses this with `paths.exclude` for vendor + testdata.
`find packages/ -name package.json -exec ...`	`dir_contains` / `for_each_dir`	Direct map. `dir_contains` is sugar for the common shape.
`find . -name README.md \| awk -F/ '{print $NF}' \| sort \| uniq -d`	`unique_by (key: "{stem}")`	Direct map.

  Patterns with no clean alint mapping
 
These are the kinds of script you keep. alint deliberately doesn't
          try to do them; many are on the v0.10+ candidate list.
    Common bash pattern Why alint can't / won't
 
   kubectl version --client | grep '^Client Version: v1.2'  Runtime probe. alint is a static checker. Use command: if you want it inside the same gate, but be aware it's a static-tool wrapping a runtime probe.  
  Cross-API-version diffs (kubernetes' verify-types-aliases.sh)  Requires parsed Go code + cross-version semantic comparison. AST territory; out of scope.  
  kustomize build ... | diff - (codegen freshness)  Running the generator + diffing the output is on the v0.10+ candidate list (generated_file_fresh); not in v0.9. Keep your script.  
  grep -E 'import "github.com/foo/bar"' --include='*.go' some/path/  This is the import_gate rule kind on the v0.10+ list. Today: keep your script, or wrap it in command:.  
  Most git log / git blame analysis (rebase-graph checks, contributor-stat gates)  alint has narrow git hygiene support (git_no_denied_paths, git_commit_message, git_blame_age); broader history-aware checks are out of scope.  
  Vendor-tree readonly enforcement (kubernetes' verify-readonly-packages.sh)  Requires hashing each vendored entry against a pinned manifest. pair_hash is on the v0.10+ list; today file_hash covers single files only.  
 
 
 
 
  Side-by-side: a real script
 Every package has a README
 
The for d in packages/*/; do …; done shape is one of the
          most common in OSS monorepos. The bash version:
 #!/usr/bin/env bash
set -euo pipefail
fail=0
for d in packages/*/; do
  if [ ! -f "$d/README.md" ]; then
    echo "Missing: $d/README.md" >&2
    fail=1
  fi
done
exit $fail
 
The alint equivalent (five lines of YAML) supports the same
          invariant, but also handles ignore filtering, parallel evaluation,
          JSON / SARIF / GitHub-annotation output formats, and
--changed-mode incremental runs:
 - id: every-package-has-readme
  kind: for_each_dir
  select: "packages/*"
  require:
    - kind: file_exists
      paths: "{path}/README.md"
  level: error
 
If your monorepo lives under packages/ AND
apps/, the monorepo@v1 bundled ruleset
          already covers this with one extends: line.
 Every Cargo.toml declares a license
 
Slightly more complex. The bash version typically has two subtle
          bugs: it doesn't catch license-file = (the alternate
          form), and it doesn't catch a TOML-comment-shaped license line that
          grep matches but TOML parsers reject. The alint version is TOML-aware:
 - id: every-crate-declares-license
  kind: for_each_dir
  select: "crates/*"
  when_iter: 'iter.has_file("Cargo.toml")'
  require:
    - kind: toml_path_matches
      paths: "{path}/Cargo.toml"
      path: "$.package.license"
      matches: '^[A-Za-z0-9.+\-]+( OR [A-Za-z0-9.+\-]+)*$'
  level: error
  when_iter: filters out non-package directories under
crates/ without any extra grep. The TOML path query
          handles the parsing; the regex pins the license to a SPDX-shaped
          value. The replacement isn't shorter; it's stricter. 
 Composability for the long tail
 
For checks that don't map (an in-house tool, an AST-aware Python
          script, a runtime probe), command: keeps them inside
          the same gate:
 - id: shellcheck-all-scripts
  kind: command
  paths: "**/*.sh"
  command: ["shellcheck", "-x", "{path}"]
  timeout: 30
  level: error

- id: keep-our-ast-checker
  kind: command
  paths: "src/**/*.py"
  command: ["scripts/check-templated-fields.py", "{path}"]
  level: warning
 
Now alint check runs both the declarative rules and the
          surviving scripts; CI calls alint check once instead of
          running bash hack/verify-all.sh.
 
  The command: trust gate
 
A few notes on the survivor-wrapping pattern:
   Trust gate. command: rules are only
            allowed in the user's own top-level config. A
kind: command rule introduced via extends:
(local file, HTTPS URL, or alint://bundled/) is
            rejected at load time. Adopting a published ruleset never grants
            arbitrary process execution.
  --changed interaction. command is a per-file rule, so under alint check
            --changed it spawns only for files in the diff. Your
            expensive shell-out is automatically incremental in CI.
  Environment threaded into the child. ALINT_PATH, ALINT_ROOT,
ALINT_RULE_ID, ALINT_LEVEL, plus any
            top-level vars: you define. Existing scripts usually
            need no changes; most read $1 or
$ALINT_PATH directly.
 
 
  Step-by-step migration
 
A real-world plan, scoped to roughly one afternoon for a 30-script
          directory.
   Inventory and categorise. For each script, ask: is
            this structural (file existence, content patterns,
            manifest fields, filename grammar)? → maps to alint.
AST-aware? → keep, optionally wrap in command:.
Runtime (binary version, network probe)? → keep,
            optionally wrap. Generator (update-*,
regen-*)? → not alint's job; keep.
Cross-API-version diff or module-graph analysis?
            → out of scope; keep. The kubernetes case study tags each of its 50
            scripts; copy that template.
  Write .alint.yml with bundled rulesets first.
Start with what already encodes 80% of what most projects want:
version: 1

extends:
  - alint://bundled/oss-baseline@v1
  - alint://bundled/<your-language>@v1
  - alint://bundled/ci/github-actions@v1
  - alint://bundled/hygiene/no-tracked-artifacts@v1

rules:
  # your project-specific rules go here
Run alint list --config .alint.yml to see the rule
            expansion. Many of the scripts you'd planned to translate will
            already be covered.
  Translate the structural subset. Walk your
            inventory's "structural" pile, one script at a time, and add a rule
            per script. Keep IDs human (team-go-license-header,
team-no-todo-in-prod); they show up in violation
            messages and CI logs. Reuse the pattern catalogue above.
  Reconcile with current main. Run
alint check against the current tip. You will surface
            either: a handful of true positives (your script's tolerances were
            slightly off; alint catches what it silently let through); a flood
            of false positives (your paths: scope is too broad,
            narrow it); or existing live failures (some bash scripts are
            silently broken: they ran, exited 0, but their grep was hitting
            nothing).
  Wrap surviving scripts in command: rules.
For the AST / runtime / cross-API-version / generator scripts you
            deliberately kept, add one command: rule each. Now
            both the declarative and survivor checks run under
alint check.
  Update CI. Replace your master
bash hack/verify-all.sh invocation with
alint check. Drop the per-script CI steps; one binary,
            one exit code, one output file.
  (Optional) retire the surviving scripts organically.
Surviving scripts that become candidates for new alint primitives
            (registry_paths_resolve, import_gate,
generated_file_fresh,
cross_file_value_equals, all on the v0.10+ list) will
            retire themselves over the next few releases.
 
 
  What you gain by consolidating
   One config to read. The structural contract becomes
            a single artifact, not a directory.
  Faster CI. alint runs declarative rules in
            parallel; on a 100k-file repo (the kubernetes shape), the
            alint-replaceable subset benchmarks at ~1-2 seconds; the equivalent
            shell pipeline measures in tens of seconds dominated by N redundant
            walks.
  Output formats CI can read. human,
json, sarif, github,
markdown, junit, gitlab,
agent. Eight formats, no echo glue.
  Auto-fix where it's safe. 12 mechanically-safe ops
            (trim whitespace, normalize line endings, strip BOM/bidi/zero-width,
            prepend / append, rename, remove).
  A declarative DSL anyone can edit. Adding a rule
            is a 5-line YAML block; editing one is changing a regex.
  Composition. extends: pulls bundled
            rulesets, local files, or HTTPS+SRI URLs.
 
 
  See also
  kubernetes case study: the headline 50-to-17 story, with the full inventory.
 cpython case study: 56 surfaces across Make targets / pre-commit / Tools/build/* / .gitattributes.
 airflow case study: 109 pre-commit hooks, ~40% map.
 Rule catalogue: every alint rule kind, with examples.
 Bundled rulesets: what extends: ships out of the box.
 How alint compares to other tools, including Repolinter, ls-lint, Megalinter, EditorConfig.
 
 
  
Start with the bundled rulesets, then translate the structural subset
          script by script. Most repos see real consolidation in one afternoon.
  Install alint → Browse the case studies

Common bash pattern	Why alint can't / won't
`kubectl version --client \| grep '^Client Version: v1.2'`	Runtime probe. alint is a static checker. Use `command:` if you want it inside the same gate, but be aware it's a static-tool wrapping a runtime probe.
Cross-API-version diffs (kubernetes' `verify-types-aliases.sh`)	Requires parsed Go code + cross-version semantic comparison. AST territory; out of scope.
`kustomize build ... \| diff -` (codegen freshness)	Running the generator + diffing the output is on the v0.10+ candidate list (`generated_file_fresh`); not in v0.9. Keep your script.
`grep -E 'import "github.com/foo/bar"' --include='*.go' some/path/`	This is the `import_gate` rule kind on the v0.10+ list. Today: keep your script, or wrap it in `command:`.
Most `git log` / `git blame` analysis (rebase-graph checks, contributor-stat gates)	alint has narrow git hygiene support (`git_no_denied_paths`, `git_commit_message`, `git_blame_age`); broader history-aware checks are out of scope.
Vendor-tree readonly enforcement (kubernetes' `verify-readonly-packages.sh`)	Requires hashing each vendored entry against a pinned manifest. `pair_hash` is on the v0.10+ list; today `file_hash` covers single files only.