Files

project setup: pyproject.toml, deps, tooling

Configure Python project with pydantic, structlog, typer, rich dependencies.
Set up ruff, mypy, pytest tooling with strict type checking.

2025-03-08 14:03:32 +00:00

5.3 KiB

Raw Blame History

Changelog

All notable changes to Veritext will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Changed

Refactored CLI metric computation to eliminate code duplication
Version format updated from 0.1.0-dev to 0.1.0.dev0 (PEP 440 compliance)
Settings instance is now cached via @lru_cache for better performance
Documented composite validators' intentional deviation from Check protocol return type

Fixed

Consolidated redundant empty checks in ROUGE-L computation
Fixed README example using incorrect property names (grade_level → flesch_kincaid_grade, reading_ease → flesch_reading_ease)

Documentation

Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks
Updated project plan with portfolio demo section
Fixed potential crash in ROUGE metric when all references are empty after tokenisation
Fixed potential division by zero in readability metric when text has no sentence endings
Fixed unbounded cache growth in SemanticSimilarity by implementing LRU eviction with configurable max size
Fixed mutable list aliasing in AllOf and AnyOf composite validators
Fixed regex pattern validation in ContainsValidator and ExcludesValidator to fail at init time rather than during check()
Fixed pytest plugin tests failing with duplicate plugin registration error

Added

Added .score property to LexicalResult for API consistency with other result types
Added cache_max_size parameter to SemanticSimilarity (default: 1000 embeddings)
Added test coverage for core/config.py and core/logging.py modules

[0.1.0] — 2025-05-17

Initial release of Veritext, a semantic text validation framework for Python.

Added

Core

Project scaffold with pyproject.toml and development tooling
Core exception hierarchy (VeritextError and subclasses)
Core types: ValidationContext, CheckResult, ValidationResult
Word tokeniser with Unicode normalisation support
Configuration module with pydantic-settings
Structured logging with structlog

Metrics

Metrics module with Metric protocol, AggregateStats, and BatchResult types
BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
Lexical similarity metric (Jaccard similarity and token overlap)
Flesch-Kincaid readability metrics (grade level and reading ease)
Batch scoring with aggregate statistics for all metrics

Validators

Validators module with Check protocol for validation checks
Metric-based validators: BleuValidator, RougeValidator, LexicalValidator
Constraint validators: LengthValidator, ReadabilityValidator, ContainsValidator, ExcludesValidator
Composite validators: AllOf (all checks must pass), AnyOf (any check must pass)
Factory functions for clean validator API (bleu(), rouge(), lexical(), length(), readability(), contains(), excludes(), all_of(), any_of())

Semantic Similarity

Semantic similarity module with embedding-based text comparison (requires veritext[semantic] extra)
SemanticSimilarity metric using sentence-transformers for semantic relatedness
SemanticValidator for threshold-based semantic similarity validation
semantic() factory function for creating semantic validators
Embedding caching for performance optimisation in repeated comparisons

Pytest Plugin

Native pytest plugin for CI/CD integration (entry point: pytest11)
validate_text() assertion function for expressive test assertions
text_validation marker for filtering validation tests
Pytest fixtures: text_validator factory and validation_context helper
Detailed failure messages with text preview and check diagnostics

Benchmarking

Benchmark module for quality tracking and regression detection
Benchmark class for evaluating text quality over time with metric storage
BenchmarkRun and RegressionReport data models for tracking runs
SQLite storage backend with WAL mode for concurrent access
Rolling window baseline computation for historical comparison
check_regression() for statistical comparison against baseline
assert_no_regression() raises RegressionDetectedError for CI integration
Customisable tolerance threshold and window size for regression detection
Metadata support for tracking git SHA, model versions, etc.

CLI

Command-line interface (CLI) via veritext command
veritext validate command for inline and file-based text validation
JSONL input format support for batch validation (--file option)
Separate candidate/reference file support (--reference-file option)
Multiple output formats: table (default), JSON, and simple text
veritext benchmark run command for running evaluations and storing results
veritext benchmark show command for viewing benchmark history
veritext benchmark check command for regression detection with exit code 1 on failure
Rich-formatted terminal output with tables and coloured panels

Documentation

Readme with usage examples
Example scripts: basic validation, chatbot testing, benchmark regression

5.3 KiB Raw Blame History

Changelog

[Unreleased]

Changed

Fixed

Documentation

Added

[0.1.0] — 2025-05-17

Added

Core

Metrics

Validators

Semantic Similarity

Pytest Plugin

Benchmarking

CLI

Documentation

5.3 KiB

Raw Blame History