project setup: pyproject.toml, deps, tooling

Configure Python project with pydantic, structlog, typer, rich dependencies. Set up ruff, mypy, pytest tooling with strict type checking.
2025-03-08 14:03:32 +00:00
commit bf5884cb27
4 changed files with 1924 additions and 0 deletions
@@ -0,0 +1,114 @@
 # Changelog
 All notable changes to Veritext will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
 ### Changed
 - Refactored CLI metric computation to eliminate code duplication
 - Version format updated from `0.1.0-dev` to `0.1.0.dev0` (PEP 440 compliance)
 - Settings instance is now cached via `@lru_cache` for better performance
 - Documented composite validators' intentional deviation from `Check` protocol return type
 ### Fixed
 - Consolidated redundant empty checks in ROUGE-L computation
 - Fixed README example using incorrect property names (`grade_level` → `flesch_kincaid_grade`, `reading_ease` → `flesch_reading_ease`)
 ### Documentation
 - Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks
 - Updated project plan with portfolio demo section
 - Fixed potential crash in ROUGE metric when all references are empty after tokenisation
 - Fixed potential division by zero in readability metric when text has no sentence endings
 - Fixed unbounded cache growth in `SemanticSimilarity` by implementing LRU eviction with configurable max size
 - Fixed mutable list aliasing in `AllOf` and `AnyOf` composite validators
 - Fixed regex pattern validation in `ContainsValidator` and `ExcludesValidator` to fail at init time rather than during `check()`
 - Fixed pytest plugin tests failing with duplicate plugin registration error
 ### Added
 - Added `.score` property to `LexicalResult` for API consistency with other result types
 - Added `cache_max_size` parameter to `SemanticSimilarity` (default: 1000 embeddings)
 - Added test coverage for `core/config.py` and `core/logging.py` modules
 ## [0.1.0] — 2025-05-17
 Initial release of Veritext, a semantic text validation framework for Python.
 ### Added
 #### Core
 - Project scaffold with pyproject.toml and development tooling
 - Core exception hierarchy (`VeritextError` and subclasses)
 - Core types: `ValidationContext`, `CheckResult`, `ValidationResult`
 - Word tokeniser with Unicode normalisation support
 - Configuration module with pydantic-settings
 - Structured logging with structlog
 #### Metrics
 - Metrics module with `Metric` protocol, `AggregateStats`, and `BatchResult` types
 - BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
 - ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
 - Lexical similarity metric (Jaccard similarity and token overlap)
 - Flesch-Kincaid readability metrics (grade level and reading ease)
 - Batch scoring with aggregate statistics for all metrics
 #### Validators
 - Validators module with `Check` protocol for validation checks
 - Metric-based validators: `BleuValidator`, `RougeValidator`, `LexicalValidator`
 - Constraint validators: `LengthValidator`, `ReadabilityValidator`, `ContainsValidator`, `ExcludesValidator`
 - Composite validators: `AllOf` (all checks must pass), `AnyOf` (any check must pass)
 - Factory functions for clean validator API (`bleu()`, `rouge()`, `lexical()`, `length()`, `readability()`, `contains()`, `excludes()`, `all_of()`, `any_of()`)
 #### Semantic Similarity
 - Semantic similarity module with embedding-based text comparison (requires `veritext[semantic]` extra)
 - `SemanticSimilarity` metric using sentence-transformers for semantic relatedness
 - `SemanticValidator` for threshold-based semantic similarity validation
 - `semantic()` factory function for creating semantic validators
 - Embedding caching for performance optimisation in repeated comparisons
 #### Pytest Plugin
 - Native pytest plugin for CI/CD integration (entry point: `pytest11`)
 - `validate_text()` assertion function for expressive test assertions
 - `text_validation` marker for filtering validation tests
 - Pytest fixtures: `text_validator` factory and `validation_context` helper
 - Detailed failure messages with text preview and check diagnostics
 #### Benchmarking
 - Benchmark module for quality tracking and regression detection
 - `Benchmark` class for evaluating text quality over time with metric storage
 - `BenchmarkRun` and `RegressionReport` data models for tracking runs
 - SQLite storage backend with WAL mode for concurrent access
 - Rolling window baseline computation for historical comparison
 - `check_regression()` for statistical comparison against baseline
 - `assert_no_regression()` raises `RegressionDetectedError` for CI integration
 - Customisable tolerance threshold and window size for regression detection
 - Metadata support for tracking git SHA, model versions, etc.
 #### CLI
 - Command-line interface (CLI) via `veritext` command
 - `veritext validate` command for inline and file-based text validation
 - JSONL input format support for batch validation (`--file` option)
 - Separate candidate/reference file support (`--reference-file` option)
 - Multiple output formats: table (default), JSON, and simple text
 - `veritext benchmark run` command for running evaluations and storing results
 - `veritext benchmark show` command for viewing benchmark history
 - `veritext benchmark check` command for regression detection with exit code 1 on failure
 - Rich-formatted terminal output with tables and coloured panels
 #### Documentation
 - Readme with usage examples
 - Example scripts: basic validation, chatbot testing, benchmark regression
@@ -0,0 +1,121 @@
 [project]
 name = "veritext"
 version = "0.1.0-dev"
 description = "Semantic text validation framework"
 readme = "readme.md"
 requires-python = ">=3.11"
 license = "MIT"
 authors = [{ name = "Kai Chappell", email = "git@kschappell.com" }]
 keywords = ["validation", "text", "nlp", "testing", "quality"]
 classifiers = [
    "Development Status :: 3 - Alpha",
    "Intended Audience :: Developers",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: Python :: 3.13",
    "Topic :: Software Development :: Testing",
    "Topic :: Text Processing",
    "Typing :: Typed",
 ]
 dependencies = [
    "pydantic>=2.0",
    "pydantic-settings>=2.0",
    "structlog>=23.0",
    "typer>=0.9",
    "rich>=13.0",
 ]
 [project.optional-dependencies]
 semantic = ["sentence-transformers>=2.2"]
 dev = [
    "pytest>=7.0",
    "pytest-cov>=4.0",
    "mypy>=1.0",
    "ruff>=0.1",
 ]
 all = ["veritext[semantic]"]
 [project.scripts]
 veritext = "veritext.cli.main:app"
 [project.entry-points.pytest11]
 veritext = "veritext.pytest_plugin"
 [project.urls]
 Repository = "https://gitea.kschappell.com/kschappell/veritext"
 [build-system]
 requires = ["hatchling"]
 build-backend = "hatchling.build"
 [tool.hatch.build.targets.wheel]
 packages = ["src/veritext"]
 [tool.ruff]
 line-length = 88
 target-version = "py311"
 src = ["src", "tests"]
 [tool.ruff.lint]
 select = [
    "E",      # pycodestyle errors
    "W",      # pycodestyle warnings
    "F",      # pyflakes
    "I",      # isort
    "B",      # flake8-bugbear
    "C4",     # flake8-comprehensions
    "UP",     # pyupgrade
    "ARG",    # flake8-unused-arguments
    "SIM",    # flake8-simplify
    "TCH",    # flake8-type-checking
    "PTH",    # flake8-use-pathlib
    "RUF",    # ruff-specific
 ]
 ignore = [
    "E501",   # line too long (handled by formatter)
 ]
 [tool.ruff.lint.isort]
 known-first-party = ["veritext"]
 [tool.mypy]
 python_version = "3.11"
 mypy_path = ["src"]
 strict = true
 warn_return_any = true
 warn_unused_ignores = true
 disallow_untyped_defs = true
 disallow_incomplete_defs = true
 check_untyped_defs = true
 disallow_untyped_decorators = true
 no_implicit_optional = true
 warn_redundant_casts = true
 warn_unused_configs = true
 show_error_codes = true
 files = ["src/veritext"]
 [[tool.mypy.overrides]]
 module = ["sentence_transformers.*"]
 ignore_missing_imports = true
 [[tool.mypy.overrides]]
 module = ["structlog", "structlog.*"]
 ignore_missing_imports = true
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 addopts = "-v --tb=short"
 pythonpath = ["src"]
 [tool.coverage.run]
 source = ["src/veritext"]
 branch = true
 [tool.coverage.report]
 exclude_lines = [
    "pragma: no cover",
    "if TYPE_CHECKING:",
    "raise NotImplementedError",
 ]
@@ -0,0 +1,50 @@
 # Veritext
 Semantic text validation framework for Python.
 Validates text outputs against quality criteria using metrics like BLEU, ROUGE,
 and semantic similarity. Designed for developers building systems that produce
 text (chatbots, content generators, summarisation tools) who need automated
 quality assurance beyond simple string matching.
 ## Status
 Under active development. See [changelog.md](changelog.md) for progress.
 ## Installation
 ```bash
 pip install veritext
 # With semantic similarity support
 pip install veritext[semantic]
 ```
 ## Quick Start
 ```python
 from veritext import validators as v
 from veritext.core.types import ValidationContext
 # Create validators
 validator = v.all_of([
    v.bleu(min_score=0.7),
    v.length(max_chars=500),
 ])
 # Validate text
 context = ValidationContext(reference="The cat sat on the mat.")
 result = validator.check("A cat is sitting on the mat.", context)
 if not result.passed:
    print(result.failure_summary)
 ```
 ## Documentation
 - [Project Plan](docs/project-plan.md)
 - [Implementation Plan](docs/implementation-plan.md)
 ## Licence
 MIT