Document command-line interface including validate command, benchmark subcommands, and output formatting options.
3.4 KiB
3.4 KiB
Changelog
All notable changes to Veritext will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
- Project scaffold with pyproject.toml and development tooling
- Core exception hierarchy (
VeritextErrorand subclasses) - Core types:
ValidationContext,CheckResult,ValidationResult - Word tokeniser with Unicode normalisation support
- Configuration module with pydantic-settings
- Structured logging with structlog
- Metrics module with
Metricprotocol,AggregateStats, andBatchResulttypes - BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
- Lexical similarity metric (Jaccard similarity and token overlap)
- ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
- Flesch-Kincaid readability metrics (grade level and reading ease)
- Batch scoring with aggregate statistics for all metrics
- Validators module with
Checkprotocol for validation checks - Metric-based validators:
BleuValidator,RougeValidator,LexicalValidator - Constraint validators:
LengthValidator,ReadabilityValidator,ContainsValidator,ExcludesValidator - Composite validators:
AllOf(all checks must pass),AnyOf(any check must pass) - Factory functions for clean validator API (
bleu(),rouge(),lexical(),length(),readability(),contains(),excludes(),all_of(),any_of()) - Semantic similarity module with embedding-based text comparison (requires
veritext[semantic]extra) SemanticSimilaritymetric using sentence-transformers for semantic relatednessSemanticValidatorfor threshold-based semantic similarity validationsemantic()factory function for creating semantic validators- Embedding caching for performance optimisation in repeated comparisons
- Native pytest plugin for CI/CD integration (entry point:
pytest11) validate_text()assertion function for expressive test assertionstext_validationmarker for filtering validation tests- Pytest fixtures:
text_validatorfactory andvalidation_contexthelper - Detailed failure messages with text preview and check diagnostics
- Benchmark module for quality tracking and regression detection
Benchmarkclass for evaluating text quality over time with metric storageBenchmarkRunandRegressionReportdata models for tracking runs- SQLite storage backend with WAL mode for concurrent access
- Rolling window baseline computation for historical comparison
check_regression()for statistical comparison against baselineassert_no_regression()raisesRegressionDetectedErrorfor CI integration- Customisable tolerance threshold and window size for regression detection
- Metadata support for tracking git SHA, model versions, etc.
- Command-line interface (CLI) via
veritextcommand veritext validatecommand for inline and file-based text validation- JSONL input format support for batch validation (
--fileoption) - Separate candidate/reference file support (
--reference-fileoption) - Multiple output formats: table (default), JSON, and simple text
veritext benchmark runcommand for running evaluations and storing resultsveritext benchmark showcommand for viewing benchmark historyveritext benchmark checkcommand for regression detection with exit code 1 on failure- Rich-formatted terminal output with tables and coloured panels