# Changelog All notable changes to Veritext will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### Changed - Refactored CLI metric computation to eliminate code duplication - Version format updated from `0.1.0-dev` to `0.1.0.dev0` (PEP 440 compliance) - Settings instance is now cached via `@lru_cache` for better performance - Documented composite validators' intentional deviation from `Check` protocol return type ### Fixed - Consolidated redundant empty checks in ROUGE-L computation - Fixed README example using incorrect property names (`grade_level` → `flesch_kincaid_grade`, `reading_ease` → `flesch_reading_ease`) ### Documentation - Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks - Updated project plan with portfolio demo section - Fixed potential crash in ROUGE metric when all references are empty after tokenisation - Fixed potential division by zero in readability metric when text has no sentence endings - Fixed unbounded cache growth in `SemanticSimilarity` by implementing LRU eviction with configurable max size - Fixed mutable list aliasing in `AllOf` and `AnyOf` composite validators - Fixed regex pattern validation in `ContainsValidator` and `ExcludesValidator` to fail at init time rather than during `check()` - Fixed pytest plugin tests failing with duplicate plugin registration error ### Added - Added `.score` property to `LexicalResult` for API consistency with other result types - Added `cache_max_size` parameter to `SemanticSimilarity` (default: 1000 embeddings) - Added test coverage for `core/config.py` and `core/logging.py` modules ## [0.1.0] — 2026-02-03 Initial release of Veritext, a semantic text validation framework for Python. ### Added #### Core - Project scaffold with pyproject.toml and development tooling - Core exception hierarchy (`VeritextError` and subclasses) - Core types: `ValidationContext`, `CheckResult`, `ValidationResult` - Word tokeniser with Unicode normalisation support - Configuration module with pydantic-settings - Structured logging with structlog #### Metrics - Metrics module with `Metric` protocol, `AggregateStats`, and `BatchResult` types - BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty) - ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure) - Lexical similarity metric (Jaccard similarity and token overlap) - Flesch-Kincaid readability metrics (grade level and reading ease) - Batch scoring with aggregate statistics for all metrics #### Validators - Validators module with `Check` protocol for validation checks - Metric-based validators: `BleuValidator`, `RougeValidator`, `LexicalValidator` - Constraint validators: `LengthValidator`, `ReadabilityValidator`, `ContainsValidator`, `ExcludesValidator` - Composite validators: `AllOf` (all checks must pass), `AnyOf` (any check must pass) - Factory functions for clean validator API (`bleu()`, `rouge()`, `lexical()`, `length()`, `readability()`, `contains()`, `excludes()`, `all_of()`, `any_of()`) #### Semantic Similarity - Semantic similarity module with embedding-based text comparison (requires `veritext[semantic]` extra) - `SemanticSimilarity` metric using sentence-transformers for semantic relatedness - `SemanticValidator` for threshold-based semantic similarity validation - `semantic()` factory function for creating semantic validators - Embedding caching for performance optimisation in repeated comparisons #### Pytest Plugin - Native pytest plugin for CI/CD integration (entry point: `pytest11`) - `validate_text()` assertion function for expressive test assertions - `text_validation` marker for filtering validation tests - Pytest fixtures: `text_validator` factory and `validation_context` helper - Detailed failure messages with text preview and check diagnostics #### Benchmarking - Benchmark module for quality tracking and regression detection - `Benchmark` class for evaluating text quality over time with metric storage - `BenchmarkRun` and `RegressionReport` data models for tracking runs - SQLite storage backend with WAL mode for concurrent access - Rolling window baseline computation for historical comparison - `check_regression()` for statistical comparison against baseline - `assert_no_regression()` raises `RegressionDetectedError` for CI integration - Customisable tolerance threshold and window size for regression detection - Metadata support for tracking git SHA, model versions, etc. #### CLI - Command-line interface (CLI) via `veritext` command - `veritext validate` command for inline and file-based text validation - JSONL input format support for batch validation (`--file` option) - Separate candidate/reference file support (`--reference-file` option) - Multiple output formats: table (default), JSON, and simple text - `veritext benchmark run` command for running evaluations and storing results - `veritext benchmark show` command for viewing benchmark history - `veritext benchmark check` command for regression detection with exit code 1 on failure - Rich-formatted terminal output with tables and coloured panels #### Documentation - Comprehensive readme with usage examples - Example scripts: basic validation, chatbot testing, benchmark regression