Files
veritext/changelog.md
Kai Chappell 0699e97e1d refactor: CLI cleanup and documentation updates
- Refactor CLI metric computation to eliminate code duplication
- Update version format to PEP 440 compliance (0.1.0.dev0)
- Cache Settings instance via @lru_cache for performance
- Document composite validators' protocol deviation
- Consolidate redundant empty checks in ROUGE-L computation
- Add Phase 10 (Portfolio Demos) to implementation plan
2026-02-04 15:38:46 +00:00

5.3 KiB

Changelog

All notable changes to Veritext will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Changed

  • Refactored CLI metric computation to eliminate code duplication
  • Version format updated from 0.1.0-dev to 0.1.0.dev0 (PEP 440 compliance)
  • Settings instance is now cached via @lru_cache for better performance
  • Documented composite validators' intentional deviation from Check protocol return type

Fixed

  • Consolidated redundant empty checks in ROUGE-L computation
  • Fixed README example using incorrect property names (grade_levelflesch_kincaid_grade, reading_easeflesch_reading_ease)

Documentation

  • Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks
  • Updated project plan with portfolio demo section
  • Fixed potential crash in ROUGE metric when all references are empty after tokenisation
  • Fixed potential division by zero in readability metric when text has no sentence endings
  • Fixed unbounded cache growth in SemanticSimilarity by implementing LRU eviction with configurable max size
  • Fixed mutable list aliasing in AllOf and AnyOf composite validators
  • Fixed regex pattern validation in ContainsValidator and ExcludesValidator to fail at init time rather than during check()
  • Fixed pytest plugin tests failing with duplicate plugin registration error

Added

  • Added .score property to LexicalResult for API consistency with other result types
  • Added cache_max_size parameter to SemanticSimilarity (default: 1000 embeddings)
  • Added test coverage for core/config.py and core/logging.py modules

[0.1.0] — 2026-02-03

Initial release of Veritext, a semantic text validation framework for Python.

Added

Core

  • Project scaffold with pyproject.toml and development tooling
  • Core exception hierarchy (VeritextError and subclasses)
  • Core types: ValidationContext, CheckResult, ValidationResult
  • Word tokeniser with Unicode normalisation support
  • Configuration module with pydantic-settings
  • Structured logging with structlog

Metrics

  • Metrics module with Metric protocol, AggregateStats, and BatchResult types
  • BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
  • ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
  • Lexical similarity metric (Jaccard similarity and token overlap)
  • Flesch-Kincaid readability metrics (grade level and reading ease)
  • Batch scoring with aggregate statistics for all metrics

Validators

  • Validators module with Check protocol for validation checks
  • Metric-based validators: BleuValidator, RougeValidator, LexicalValidator
  • Constraint validators: LengthValidator, ReadabilityValidator, ContainsValidator, ExcludesValidator
  • Composite validators: AllOf (all checks must pass), AnyOf (any check must pass)
  • Factory functions for clean validator API (bleu(), rouge(), lexical(), length(), readability(), contains(), excludes(), all_of(), any_of())

Semantic Similarity

  • Semantic similarity module with embedding-based text comparison (requires veritext[semantic] extra)
  • SemanticSimilarity metric using sentence-transformers for semantic relatedness
  • SemanticValidator for threshold-based semantic similarity validation
  • semantic() factory function for creating semantic validators
  • Embedding caching for performance optimisation in repeated comparisons

Pytest Plugin

  • Native pytest plugin for CI/CD integration (entry point: pytest11)
  • validate_text() assertion function for expressive test assertions
  • text_validation marker for filtering validation tests
  • Pytest fixtures: text_validator factory and validation_context helper
  • Detailed failure messages with text preview and check diagnostics

Benchmarking

  • Benchmark module for quality tracking and regression detection
  • Benchmark class for evaluating text quality over time with metric storage
  • BenchmarkRun and RegressionReport data models for tracking runs
  • SQLite storage backend with WAL mode for concurrent access
  • Rolling window baseline computation for historical comparison
  • check_regression() for statistical comparison against baseline
  • assert_no_regression() raises RegressionDetectedError for CI integration
  • Customisable tolerance threshold and window size for regression detection
  • Metadata support for tracking git SHA, model versions, etc.

CLI

  • Command-line interface (CLI) via veritext command
  • veritext validate command for inline and file-based text validation
  • JSONL input format support for batch validation (--file option)
  • Separate candidate/reference file support (--reference-file option)
  • Multiple output formats: table (default), JSON, and simple text
  • veritext benchmark run command for running evaluations and storing results
  • veritext benchmark show command for viewing benchmark history
  • veritext benchmark check command for regression detection with exit code 1 on failure
  • Rich-formatted terminal output with tables and coloured panels

Documentation

  • Comprehensive readme with usage examples
  • Example scripts: basic validation, chatbot testing, benchmark regression