Skip to content

Architecture

Status: Living design document. Describes alint’s internals for contributors and embedders. For the current scope by version, see ROADMAP.md.

alint is a language-agnostic linter for repository structure, file existence, filename conventions, and file content rules. It is a single static Rust binary that reads a declarative YAML config and enforces rules over a repository tree.

Examples of rules in scope:

  • Does file X exist at path Y? Does directory Z contain file W?
  • Do filenames under components/ follow PascalCase?
  • Does every .c file have a matching .h in the same directory?
  • Does every .java file start with a required license-header comment?
  • Is anything binary present under src/?
  • Does package.json’s license field equal "Apache-2.0"?

Out of scope (explicitly — use the named tool instead):

  • Code semantics / AST linting → ESLint, Clippy, ruff
  • Static application security testing → Semgrep, CodeQL
  • Infrastructure-as-code scanning → Checkov, Conftest, tfsec
  • Commit-message linting → commitlint
  • Secret scanning → gitleaks, trufflehog

The clarity of these non-goals is itself a feature.

  1. The repository tree is the input. Every rule sees a unified file/directory index. The walk happens once per invocation.
  2. One DSL, four rule families. Layout (exists/absent, directory contents), content (regex, header, hash, size, text/binary, structured-path queries), naming (case, regex, template), cross-file (pair, for-each, every-matching).
  3. Declarative by default, programmable at the edges. YAML covers typical rules. A bounded expression language gates rules on facts. A plugin surface (command, later WASM) covers user-defined logic.
  4. Walk once, evaluate in parallel. Single-pass walker; shared file index; rayon for rule-level parallelism.
  5. Respect ecosystem defaults. .gitignore is honored by default. YAML is the config format. Case aliases (PascalCase / pascalcase / pascal-case) all parse.
  6. Every rule carries its own story. Severity, message, policy_url, and optional fix are first-class fields.
  7. Modern output formats from day one. Human, JSON, SARIF, GitHub annotations, JUnit.
  8. Single static binary. Rust, no runtime dependency on Node, Ruby, or Python.

Every rule is a record:

  • id — unique kebab-case identifier
  • kind — the primitive rule type, namespaced (file_exists, filename_case, pair, …)
  • levelerror | warning | info | off
  • paths — the scope glob(s); accepts a string, an array (with !negation), or {include, exclude}
  • when — optional expression gating rule application on facts
  • message — human message; supports {{vars.*}} and {{ctx.*}} substitution
  • policy_url — optional URL to a human-readable policy justification
  • fix — optional fixer block
  • kind-specific fields

Severity maps to exit codes: error with violations → 1; warning → 0 unless --fail-on-warning; info → 0; off → rule skipped. off is useful when overriding a rule inherited from an extends-ed ruleset.

The Rule trait is the unit of execution:

pub trait Rule: Send + Sync + std::fmt::Debug {
fn id(&self) -> &str;
fn level(&self) -> Level;
fn policy_url(&self) -> Option<&str> { None }
fn evaluate(&self, ctx: &Context<'_>) -> Result<Vec<Violation>>;
}

Rules produce Violations; the engine aggregates them into a Report.

YAML, with a JSON Schema (draft 2020-12) maintained at schemas/v1/config.json in this repository and embedded into alint-dsl at build time via include_str! (exposed as alint_dsl::CONFIG_SCHEMA_V1). Integration tests round-trip representative configs through a compliant validator so the schema and the engine’s actual DSL stay in sync.

For editor autocomplete, reference the schema via the YAML language server pragma — either by a relative path (recommended inside this repo) or by the GitHub raw URL (for downstream users, once the repo is public):

# yaml-language-server: $schema=./schemas/v1/config.json
# or: $schema=https://raw.githubusercontent.com/asamarts/alint/main/schemas/v1/config.json
.alint.yml
version: 1
extends:
- url: https://raw.githubusercontent.com/example/rulesets/base.yaml
sha256: "a1b2..." # optional subresource integrity
- path: ./team-policy.alint.yml
ignore:
- "target/**"
respect_gitignore: true
vars:
copyright_year: "2026"
facts:
- id: has_rust
any_file_exists: ["Cargo.toml"]
rules:
- id: readme-exists
kind: file_exists
paths: ["README.md", "README", "README.rst"]
root_only: true
level: error
message: "A README file is required at the repository root."
- id: c-requires-h
kind: pair
primary: "**/*.c"
partner: "{dir}/{stem}.h"
level: error
- id: components-pascalcase
kind: filename_case
paths: "components/**/*.{tsx,jsx}"
case: pascal
level: error
- id: cargo-lock-checked-in
when: facts.has_rust
kind: file_exists
paths: "Cargo.lock"
root_only: true
level: error

Not every primitive is available in every release — see ROADMAP.md for which ship in which version. The full taxonomy:

Layout family

KindPurpose
file_existsAny file matching paths must exist. root_only: true constrains to repo root.
file_absentNo file matching paths may exist.
dir_exists / dir_absentDirectory presence / absence.
dir_containsEvery directory matching select must contain files matching require.
dir_only_containsEvery directory matching select may contain only files matching allow.

Content family

KindPurpose
file_content_matches / file_content_forbiddenFile contents must (not) match regex. Aliases: content_matches, content_forbidden.
file_header / file_footerFirst / last N lines must match pattern. Alias for file_header: header.
file_starts_with / file_ends_withByte-level prefix / suffix check (works on binary files).
file_hashContent SHA-256 matches expected.
file_max_size / file_max_linesUpper bounds. Alias for file_max_size: max_size.
file_is_text / file_is_binary / file_is_asciiContent is detected as text / binary / 7-bit ASCII. Alias for file_is_text: is_text.
no_bomFlag a leading UTF-8 / UTF-16 / UTF-32 byte-order mark. Fixable via file_strip_bom.
file_shebangFirst line is a shebang matching pattern.
json_path_equals / json_path_matchesJSONPath query returns expected value / matches regex.
yaml_path_* / toml_path_*Same for YAML / TOML.
json_schema_passesFile validates against a JSON Schema.

Text-hygiene family

KindPurpose
no_trailing_whitespaceNo line may end with space/tab. Fixable via file_trim_trailing_whitespace.
final_newlineFile must end with \n. Fixable via file_append_final_newline.
line_endingsEvery line uses the configured target (lf or crlf). Fixable via file_normalize_line_endings.
line_max_widthCap line length in characters (optional tab_width).
indent_styleEvery non-blank line indents with tabs or spaces (with optional width). Check-only.
max_consecutive_blank_linesCap runs of blank lines to max. Fixable via file_collapse_blank_lines.

Security / Unicode-sanity family

KindPurpose
no_merge_conflict_markersFlag <<<<<<<, =======, >>>>>>> markers at line start.
no_bidi_controlsFlag Trojan-Source bidi controls (U+202A–202E, U+2066–2069). Fixable via file_strip_bidi.
no_zero_width_charsFlag body-internal U+200B/C/D and non-leading U+FEFF. Fixable via file_strip_zero_width.

Structure family

KindPurpose
max_directory_depthCap how deep the tree may go.
max_files_per_directoryCap per-directory fanout.
no_empty_filesFlag zero-byte files. Fixable via file_remove.

Naming family

KindPurpose
filename_caseBasename matches a case convention (lower, upper, pascal, camel, snake, kebab, screaming-snake, flat).
filename_regexBasename matches a regex.
filename_matches_templateBasename matches a template with captures.
path_caseEvery path segment matches a case convention.
path_max_depthRelative path has at most N segments.

Cross-file family

KindPurpose
pairFor every file matching primary, a file matching the partner template must exist.
for_each_file / for_each_dirFor every matching file/dir, evaluate nested require rules with the entry as context.
every_matching_hasSugar for a common for_each shape.
unique_byNo two files matching select may share the value of key.

Portable-metadata family

Portability checks that reject tree shapes which look fine on one OS but break checkouts elsewhere.

KindPurpose
no_case_conflictsFlag pairs of paths that differ only by case (e.g. README.md + readme.md). They can’t coexist on macOS HFS+/APFS or Windows NTFS defaults.
no_illegal_windows_namesReject reserved device names (CON, PRN, AUX, NUL, COM1–9, LPT1–9 — case-insensitive, regardless of extension), trailing dots/spaces, and the chars `<>:”

Unix-metadata family

Unix-filesystem metadata checks. All rules in this family are #[cfg(unix)]-gated at the engine layer: they emit no violations on Windows, so configs remain portable.

KindPurpose
no_symlinksFlag tracked paths that are symbolic links (portability footgun on Windows + CI). Fixable via file_remove.
executable_bitAssert every file in scope has (require: true) or lacks (require: false) the +x bit. No fix op — chmod auto-apply deferred.
executable_has_shebangEvery +x file must begin with #!. No fix op.
shebang_has_executableEvery file starting with #! must have +x set. No fix op.

Git-hygiene family

KindPurpose
no_submodulesFlag .gitmodules at the repo root. Always targets .gitmodules (no paths override). Fixable via file_remove.
git_tracked_onlyEvery matching file must be git-tracked.
git_no_denied_pathsGit tree must not contain paths matching pattern.
git_commit_messageCommits in range must match / not-match patterns.

Plugin family

KindPurpose
commandShell out to an external command with {path}; non-zero exit = failure.
wasmEvaluate a WASM plugin against the file / tree.

Rules that declare a fix: block opt in to automatic remediation. The op is a discriminated union keyed by op name; each rule kind accepts at most one op.

Path-only ops (ignore fix_size_limit):

OpShapeRule kinds
file_create{content, path?, create_parents?}file_exists
file_remove{}file_absent, no_empty_files, no_symlinks, no_submodules
file_rename{} (target derived from rule config)filename_case

Content-editing ops (skipped on files over fix_size_limit; default 1 MiB, null disables):

OpShapeRule kinds
file_prepend{content}file_header
file_append{content}file_content_matches
file_trim_trailing_whitespace{}no_trailing_whitespace
file_append_final_newline{}final_newline
file_normalize_line_endings{} (target read from parent rule)line_endings
file_strip_bidi{}no_bidi_controls
file_strip_zero_width{}no_zero_width_chars
file_strip_bom{}no_bom
file_collapse_blank_lines{} (max read from parent rule)max_consecutive_blank_lines

Over-limit content-editing ops report Skipped with a stderr warning instead of applying. Reads are streaming where possible; otherwise the file is loaded in full. Fixers run serially after parallel evaluation so the tree is mutated from a single thread.

Used in partner, nested require, messages, and rename fixers:

  • {dir} — parent directory of the matched file
  • {path} — full relative path
  • {basename} — filename including extension
  • {stem} — filename without the final extension
  • {ext} — final extension without the dot
  • {parent_name} — immediate parent directory name
  • {stem_kebab}, {stem_snake}, {stem_pascal}, … — transformed stems

Globs compile via globset:

  • * — any run of non-separator chars
  • ** — any number of path segments (own segment only)
  • ? — one non-separator char
  • [abc], [a-z] — character classes
  • {a,b,c} — brace alternation
  • !pattern — negation (arrays only)

Every rule’s paths accepts one of three shapes:

paths: "src/**/*.rs" # single glob
paths: ["src/**/*.rs", "!src/**/testdata/**"] # array with negation
paths: {include: ["src/**"], exclude: ["**/*.test.*"]} # explicit pair

.gitignore is honored by default. .alintignore provides alint-specific exclusions. ignore: in config adds to the exclusion set.

Facts are declarative properties of the repository, evaluated once per run and cached. when clauses gate rules on facts.

Fact kinds include any_file_exists, all_files_exist, file_content_matches, detect: linguist (primary languages), detect: askalono (SPDX license), count_files, count_contributors, git_branch, and custom: {command: [...]} (shell out, JSON stdout → value).

The when expression language is deliberately bounded:

  • Operators: ==, !=, <, <=, >, >=, and, or, not, in, matches
  • Identifiers: facts.<name>, vars.<name>, ctx.<name>
  • Literals: strings, numbers, booleans, null, lists

No user-defined functions, no recursion, no I/O. Examples:

when: facts.has_rust
when: facts.primary_language in ["Rust", "Go"]
when: facts.has_rust and not facts.is_workspace_member
when: count_files("**/*.java") > 0

Closest-ancestor scoping (scope_filter:, v0.9.6+)

Section titled “Closest-ancestor scoping (scope_filter:, v0.9.6+)”

A second per-file gate orthogonal to when: and paths:. Per-file rules can declare scope_filter: { has_ancestor: <list> } to narrow themselves to files that have a specified manifest somewhere in their ancestor directory chain. The engine walks Path::parent() upward (the file’s own directory counts as an ancestor) and consults the v0.9.5 path-index at each step; first-match-wins gates the rule per-file.

- id: rust-sources-no-bidi
when: facts.has_rust # tree-level gate
kind: no_bidi_controls
paths: "**/*.rs" # path glob
scope_filter: # ancestor walk
has_ancestor: Cargo.toml
level: error

The composition order is: 1. Tree-level when: (skip rule entirely if false), 2. Per-file paths: glob, 3. Per-file scope_filter: ancestor walk, 4. Per-file git_tracked_only: consult, 5. Rule-specific evaluate body.

Cross-file rules (pair, for_each_dir, file_exists, …) reject scope_filter: at build time and direct authors to for_each_dir + when_iter:. Rule-major rules like filename_case silently ignore the field — gate via the rule’s paths: glob instead.

Used by the five bundled ecosystem rulesets (rust@v1, node@v1, python@v1, go@v1, java@v1) so their per-file content rules narrow to files inside their ecosystem’s package subtree in polyglot monorepos. Full design: v0.9/scope-filter.md.

extends accepts local paths and URLs (with optional SHA-256 subresource integrity). Child configs deep-merge over parents on a per-rule-id basis. Setting level: off on an inherited rule disables it.

Bundled rulesets are referenced via alint://bundled/<name>@v<major>.

  1. Config load. Read .alint.yml; follow extends with caching and cycle detection; validate against JSON Schema.
  2. Facts. Evaluate facts in parallel. Cache keyed on input hashes.
  3. Rule filter. Evaluate when clauses; drop disabled rules.
  4. Walk. One pass over the filesystem via the ignore crate. Build a FileIndex of path → metadata (size, is-text heuristic, extension, parent).
  5. Match. For each rule, resolve matching files/dirs from the index via compiled GlobSet.
  6. Read cache. Files requested by multiple content rules are read once; bytes cached for the run.
  7. Evaluate. Rule evaluation fans out via rayon.
  8. Aggregate. Collect RuleResults into a Report.
  9. Fix (optional). Apply fixers serially; re-run checks.
  10. Emit. Format via selected output.

Invariants: the walk runs exactly once per invocation; any given file’s bytes are read at most once; fact and rule evaluation are both parallelized; fixers run serially (they mutate the tree).

alint is a Cargo workspace — the standard shape for Rust tools (rustc, cargo, tokio, ruff, biome, rust-analyzer, wasmtime, …). The reasons apply here: pre-1.0 breaking changes in the core ripple through the graph, so every such change is one PR rather than a multi-repo release; one Cargo.lock guarantees consistent transitive deps; one CI run (cargo test --workspace) validates the full graph; contributors clone once.

Current crates:

alint/
├── crates/
│ ├── alint/ binary entrypoint; `cargo install alint`
│ ├── alint-core/ engine, walker, rule trait, config AST, errors
│ ├── alint-dsl/ YAML config loader + schema validation + bundled rulesets
│ ├── alint-rules/ built-in rule implementations
│ ├── alint-output/ formatters (human, json, sarif, github, markdown, junit, gitlab, agent)
│ ├── alint-bench/ criterion micro-benches + seeded tree generator
│ ├── alint-testkit/ shared test harness (treespec materializer, scenario runner, proptest strategies)
│ └── alint-e2e/ end-to-end scenarios + coverage audits + cross-cutting invariant tests
├── xtask/ cargo-xtask helpers (bench-release driver, docs-export, publish-benches)
├── ci/ self-hosted runner + per-job shell scripts
├── schemas/v1/ JSON Schema for .alint.yml + report shapes (mirrored under crates/alint-dsl/schemas/)
├── docs/
│ ├── design/ architecture, roadmap, per-cut design passes (v0.7, v0.9, ...)
│ ├── development/ contributor docs (rule-authoring.md)
│ └── benchmarks/ methodology + per-platform published numbers (micro/, macro/, investigations/, archive/)
├── install.sh curl-pipeable platform-detecting installer
├── action.yml official GitHub Action (composite)
├── npm/ npm wrapper that downloads the platform-matched binary
├── Dockerfile distroless image; rebuilt per release
├── .alint.yml dogfood config
└── Cargo.toml workspace manifest

Planned additions (see ROADMAP.md):

  • crates/alint-lsp/ — language-server implementation (v0.10)
  • crates/alint-plugin/ — WASM plugin host (v0.11; the tier-1 command plugin already lives in alint-rules since v0.5.1)
  • editors/ — VS Code, Zed, Helix extensions (paired with v0.10)
  • crates/alint-facts/ — currently subsumed by alint-core::facts; promotion to its own crate is deferred until language and license detectors (PROPOSAL §4.6 detect: linguist, detect: askalono) actually land

The public crate surface is kept narrow so the semver-stable API is small and maintainable.

CratepublishWhy
alint (binary)publicEnables cargo install alint. Package name matches [[bin]] name.
alint-corepublicEmbeddable engine for custom drivers — scripts, custom CI gates, third-party hosts. Semver-stable from 1.0.
alint-dsl, alint-rules, alint-output, and later-phase cratespublish = falseInternal plumbing. Promotion to public requires a concrete external consumer and a commitment to maintain the API.

Unpublished crates still ship inside the binary and can be promoted later. The reverse — publishing a crate then realizing you don’t want to maintain it as a stable API — is much harder.

Two tiers, introduced across the roadmap:

  • command rule kind. Rule shells out per matched file. Exit code is the verdict; stdout/stderr is the message. Environment variables expose path, rule id, level, vars, and facts. Simple, scriptable, language-agnostic.
  • wasm plugin kind. Plugins implement a stable WIT interface, receive file bytes + metadata, and return a structured result. Distributed as .wasm blobs referenced by URL with SRI. Sandboxed, deterministic, no network by default (opt-in capability).

Native Rust plugins are deliberately out of scope. Dynamic library loading has ABI stability problems and would lock the plugin ecosystem to Rust. WASM is the long-term answer.

Selected via --format:

  • human (default) — colorized, per-rule grouped output with source snippets.
  • json — stable, versioned schema.
  • sarif — SARIF 2.1.0 for GitHub Code Scanning and Azure DevOps.
  • github — GitHub Actions annotations (::error file=...).
  • gitlab — GitLab Code Quality JSON.
  • junit — JUnit XML for generic CI reporting.
  • markdown — report with TOC, suitable for posting as a GitHub issue body.
  • summary — one-line status per rule.

A Rust project dogfood config, showing composition, facts, and multiple rule families:

# yaml-language-server: $schema=./schemas/v1/config.json
version: 1
extends:
- alint://bundled/oss-baseline@v1
- alint://bundled/rust@v1
vars:
copyright_year: "2026"
org: "Acme Corp"
ignore:
- "target/**"
facts:
- id: has_benches
any_file_exists: ["benches/**/*.rs"]
rules:
# Override an inherited rule.
readme-exists:
paths: ["README.md", "README.adoc"]
# Disable an inherited rule.
no-todo-comments:
level: off
# New rules.
- id: cargo-members-are-kebab
kind: toml_path_matches
paths: "Cargo.toml"
query: "$.workspace.members[*]"
pattern: "^[a-z][a-z0-9-]+$"
level: error
- id: crates-have-readme
kind: for_each_dir
select: "crates/*"
require:
- kind: file_exists
paths: "{dir}/README.md"
level: error
- id: integration-test-pair
kind: pair
primary: "crates/*/src/*.rs"
partner: "crates/*/tests/{stem}_test.rs"
level: warning
- id: bench-gated
when: facts.has_benches
kind: file_exists
paths: "benches/Cargo.toml"
level: error

A new rule kind typically needs:

  1. A Rule impl in crates/alint-rules/src/<kind>.rs.
  2. Registration in register_builtin in crates/alint-rules/src/lib.rs.
  3. Unit tests alongside the impl (snapshot harness arrives with alint-test).
  4. A row in the primitives tables above.
  5. An entry in the Full Example section if the kind is commonly used.
  6. If the primitive shifts from “planned” to “shipped,” an update in ROADMAP.md.