project setup: pyproject.toml, deps, tooling

Configure Python project with pydantic, structlog, typer, rich dependencies. Set up ruff, mypy, pytest tooling with strict type checking.
2025-03-08 14:03:32 +00:00
commit bf5884cb27
4 changed files with 1924 additions and 0 deletions
@@ -0,0 +1,114 @@
+# Changelog
+
+All notable changes to Veritext will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Changed
+
+- Refactored CLI metric computation to eliminate code duplication
+- Version format updated from `0.1.0-dev` to `0.1.0.dev0` (PEP 440 compliance)
+- Settings instance is now cached via `@lru_cache` for better performance
+- Documented composite validators' intentional deviation from `Check` protocol return type
+
+### Fixed
+
+- Consolidated redundant empty checks in ROUGE-L computation
+- Fixed README example using incorrect property names (`grade_level` → `flesch_kincaid_grade`, `reading_ease` → `flesch_reading_ease`)
+
+### Documentation
+
+- Added Phase 10 (Portfolio Demos) to implementation plan: Streamlit demo and Jupyter notebooks
+- Updated project plan with portfolio demo section
+- Fixed potential crash in ROUGE metric when all references are empty after tokenisation
+- Fixed potential division by zero in readability metric when text has no sentence endings
+- Fixed unbounded cache growth in `SemanticSimilarity` by implementing LRU eviction with configurable max size
+- Fixed mutable list aliasing in `AllOf` and `AnyOf` composite validators
+- Fixed regex pattern validation in `ContainsValidator` and `ExcludesValidator` to fail at init time rather than during `check()`
+- Fixed pytest plugin tests failing with duplicate plugin registration error
+
+### Added
+
+- Added `.score` property to `LexicalResult` for API consistency with other result types
+- Added `cache_max_size` parameter to `SemanticSimilarity` (default: 1000 embeddings)
+- Added test coverage for `core/config.py` and `core/logging.py` modules
+
+## [0.1.0] — 2025-05-17
+
+Initial release of Veritext, a semantic text validation framework for Python.
+
+### Added
+
+#### Core
+
+- Project scaffold with pyproject.toml and development tooling
+- Core exception hierarchy (`VeritextError` and subclasses)
+- Core types: `ValidationContext`, `CheckResult`, `ValidationResult`
+- Word tokeniser with Unicode normalisation support
+- Configuration module with pydantic-settings
+- Structured logging with structlog
+
+#### Metrics
+
+- Metrics module with `Metric` protocol, `AggregateStats`, and `BatchResult` types
+- BLEU metric implementation (BLEU-1 through BLEU-4 with brevity penalty)
+- ROUGE metric (ROUGE-1, ROUGE-2, ROUGE-L with precision/recall/F-measure)
+- Lexical similarity metric (Jaccard similarity and token overlap)
+- Flesch-Kincaid readability metrics (grade level and reading ease)
+- Batch scoring with aggregate statistics for all metrics
+
+#### Validators
+
+- Validators module with `Check` protocol for validation checks
+- Metric-based validators: `BleuValidator`, `RougeValidator`, `LexicalValidator`
+- Constraint validators: `LengthValidator`, `ReadabilityValidator`, `ContainsValidator`, `ExcludesValidator`
+- Composite validators: `AllOf` (all checks must pass), `AnyOf` (any check must pass)
+- Factory functions for clean validator API (`bleu()`, `rouge()`, `lexical()`, `length()`, `readability()`, `contains()`, `excludes()`, `all_of()`, `any_of()`)
+
+#### Semantic Similarity
+
+- Semantic similarity module with embedding-based text comparison (requires `veritext[semantic]` extra)
+- `SemanticSimilarity` metric using sentence-transformers for semantic relatedness
+- `SemanticValidator` for threshold-based semantic similarity validation
+- `semantic()` factory function for creating semantic validators
+- Embedding caching for performance optimisation in repeated comparisons
+
+#### Pytest Plugin
+
+- Native pytest plugin for CI/CD integration (entry point: `pytest11`)
+- `validate_text()` assertion function for expressive test assertions
+- `text_validation` marker for filtering validation tests
+- Pytest fixtures: `text_validator` factory and `validation_context` helper
+- Detailed failure messages with text preview and check diagnostics
+
+#### Benchmarking
+
+- Benchmark module for quality tracking and regression detection
+- `Benchmark` class for evaluating text quality over time with metric storage
+- `BenchmarkRun` and `RegressionReport` data models for tracking runs
+- SQLite storage backend with WAL mode for concurrent access
+- Rolling window baseline computation for historical comparison
+- `check_regression()` for statistical comparison against baseline
+- `assert_no_regression()` raises `RegressionDetectedError` for CI integration
+- Customisable tolerance threshold and window size for regression detection
+- Metadata support for tracking git SHA, model versions, etc.
+
+#### CLI
+
+- Command-line interface (CLI) via `veritext` command
+- `veritext validate` command for inline and file-based text validation
+- JSONL input format support for batch validation (`--file` option)
+- Separate candidate/reference file support (`--reference-file` option)
+- Multiple output formats: table (default), JSON, and simple text
+- `veritext benchmark run` command for running evaluations and storing results
+- `veritext benchmark show` command for viewing benchmark history
+- `veritext benchmark check` command for regression detection with exit code 1 on failure
+- Rich-formatted terminal output with tables and coloured panels
+
+#### Documentation
+
+- Readme with usage examples
+- Example scripts: basic validation, chatbot testing, benchmark regression
@@ -0,0 +1,121 @@
+[project]
+name = "veritext"
+version = "0.1.0-dev"
+description = "Semantic text validation framework"
+readme = "readme.md"
+requires-python = ">=3.11"
+license = "MIT"
+authors = [{ name = "Kai Chappell", email = "git@kschappell.com" }]
+keywords = ["validation", "text", "nlp", "testing", "quality"]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "Topic :: Software Development :: Testing",
+    "Topic :: Text Processing",
+    "Typing :: Typed",
+]
+dependencies = [
+    "pydantic>=2.0",
+    "pydantic-settings>=2.0",
+    "structlog>=23.0",
+    "typer>=0.9",
+    "rich>=13.0",
+]
+
+[project.optional-dependencies]
+semantic = ["sentence-transformers>=2.2"]
+dev = [
+    "pytest>=7.0",
+    "pytest-cov>=4.0",
+    "mypy>=1.0",
+    "ruff>=0.1",
+]
+all = ["veritext[semantic]"]
+
+[project.scripts]
+veritext = "veritext.cli.main:app"
+
+[project.entry-points.pytest11]
+veritext = "veritext.pytest_plugin"
+
+[project.urls]
+Repository = "https://gitea.kschappell.com/kschappell/veritext"
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/veritext"]
+
+[tool.ruff]
+line-length = 88
+target-version = "py311"
+src = ["src", "tests"]
+
+[tool.ruff.lint]
+select = [
+    "E",      # pycodestyle errors
+    "W",      # pycodestyle warnings
+    "F",      # pyflakes
+    "I",      # isort
+    "B",      # flake8-bugbear
+    "C4",     # flake8-comprehensions
+    "UP",     # pyupgrade
+    "ARG",    # flake8-unused-arguments
+    "SIM",    # flake8-simplify
+    "TCH",    # flake8-type-checking
+    "PTH",    # flake8-use-pathlib
+    "RUF",    # ruff-specific
+]
+ignore = [
+    "E501",   # line too long (handled by formatter)
+]
+
+[tool.ruff.lint.isort]
+known-first-party = ["veritext"]
+
+[tool.mypy]
+python_version = "3.11"
+mypy_path = ["src"]
+strict = true
+warn_return_any = true
+warn_unused_ignores = true
+disallow_untyped_defs = true
+disallow_incomplete_defs = true
+check_untyped_defs = true
+disallow_untyped_decorators = true
+no_implicit_optional = true
+warn_redundant_casts = true
+warn_unused_configs = true
+show_error_codes = true
+files = ["src/veritext"]
+
+[[tool.mypy.overrides]]
+module = ["sentence_transformers.*"]
+ignore_missing_imports = true
+
+[[tool.mypy.overrides]]
+module = ["structlog", "structlog.*"]
+ignore_missing_imports = true
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+addopts = "-v --tb=short"
+pythonpath = ["src"]
+
+[tool.coverage.run]
+source = ["src/veritext"]
+branch = true
+
+[tool.coverage.report]
+exclude_lines = [
+    "pragma: no cover",
+    "if TYPE_CHECKING:",
+    "raise NotImplementedError",
+]
@@ -0,0 +1,50 @@
+# Veritext
+
+Semantic text validation framework for Python.
+
+Validates text outputs against quality criteria using metrics like BLEU, ROUGE,
+and semantic similarity. Designed for developers building systems that produce
+text (chatbots, content generators, summarisation tools) who need automated
+quality assurance beyond simple string matching.
+
+## Status
+
+Under active development. See [changelog.md](changelog.md) for progress.
+
+## Installation
+
+```bash
+pip install veritext
+
+# With semantic similarity support
+pip install veritext[semantic]
+```
+
+## Quick Start
+
+```python
+from veritext import validators as v
+from veritext.core.types import ValidationContext
+
+# Create validators
+validator = v.all_of([
+    v.bleu(min_score=0.7),
+    v.length(max_chars=500),
+])
+
+# Validate text
+context = ValidationContext(reference="The cat sat on the mat.")
+result = validator.check("A cat is sitting on the mat.", context)
+
+if not result.passed:
+    print(result.failure_summary)
+```
+
+## Documentation
+
+- [Project Plan](docs/project-plan.md)
+- [Implementation Plan](docs/implementation-plan.md)
+
+## Licence
+
+MIT