Commit Graph

52 Commits

Author SHA1 Message Date
69966d171c docs(examples): add basic validation example
Demonstrates core Veritext functionality: metrics, validators, composites,
and constraint validators with runnable code.
2026-02-03 19:13:47 +00:00
d5df8b52e6 docs: add branch creation instruction to git workflow
Explicitly documents the requirement to create a new branch before starting
work from a plan, consistent with the parent workspace CLAUDE.md instruction.
2026-02-03 19:06:45 +00:00
8b7c087de7 docs(changelog): add CLI entries
Document command-line interface including validate command,
benchmark subcommands, and output formatting options.
2026-02-03 18:22:50 +00:00
c54f8c3f6f test(cli): add CLI tests
Add comprehensive test suite for validate command, benchmark commands,
input readers, and output formatters using Typer CliRunner.
2026-02-03 18:22:31 +00:00
0cadfd4d23 feat(cli): add benchmark subcommands
Add benchmark run, show, and check commands for quality tracking
with regression detection supporting CI integration.
2026-02-03 18:20:28 +00:00
e128720917 feat(cli): add validate command
Implement validate command with inline and file-based modes
supporting BLEU, ROUGE, and lexical metrics with multiple output formats.
2026-02-03 18:19:20 +00:00
f713d5e8a6 feat(cli): add Rich output formatters
Add formatters for validation results (table/json/simple) and
benchmark history display with regression report panels.
2026-02-03 18:17:33 +00:00
9853b57843 feat(cli): add JSONL and directory input readers
Add TextPair dataclass and read_jsonl/read_paired_jsonl functions
for parsing candidate-reference pairs from JSONL files.
2026-02-03 18:16:34 +00:00
55faae3e1b feat(cli): add CLI entry point with version command
Initialise Typer app with --version flag and help text.
2026-02-03 18:16:07 +00:00
07ac70e835 docs(changelog): add benchmark entries
Document benchmark module features in changelog.
2026-02-03 18:10:19 +00:00
6d1bece815 test(benchmark): add benchmark module tests
Comprehensive tests for models, storage, regression detection, and runner.
2026-02-03 18:10:13 +00:00
40fa39485e feat(benchmark): add module exports
Public API exports for the benchmark module.
2026-02-03 18:10:07 +00:00
9115f0c25b feat(benchmark): add Benchmark runner class
Main Benchmark class for evaluating text quality and tracking regressions.
2026-02-03 18:10:01 +00:00
83c4b4bee5 feat(benchmark): add regression detection
Rolling window baseline computation and statistical regression detection.
2026-02-03 18:09:55 +00:00
44e3e8f4ea feat(benchmark): add SQLite storage backend
Persistent storage for benchmark history with WAL mode for concurrent access.
2026-02-03 18:09:49 +00:00
45dfe07772 feat(benchmark): add BenchmarkRun and RegressionReport models
Data models for benchmark runs and regression reports using Pydantic.
2026-02-03 18:09:43 +00:00
6bafc43754 docs(changelog): add pytest plugin entries 2026-02-03 17:40:52 +00:00
012b306749 test(pytest-plugin): add plugin tests
Cover validate_text assertions, fixture factories, marker registration,
and pytest integration using pytester for subprocess testing.
2026-02-03 17:40:46 +00:00
ac7c5c69cf feat(pytest-plugin): add validate_text assertion
Primary API for text validation in pytest with keyword arguments
for BLEU, ROUGE, semantic similarity, length, readability, and
pattern matching. Includes detailed failure formatting.
2026-02-03 17:40:40 +00:00
cd36c54e22 feat(pytest-plugin): add plugin hooks and markers
Register text_validation marker via pytest_configure hook.
2026-02-03 17:40:33 +00:00
107fc4e275 docs(changelog): add semantic similarity entries 2026-02-03 17:31:14 +00:00
571b770281 test(semantic): add semantic similarity tests 2026-02-03 17:31:07 +00:00
8b3536873e feat(validators): add SemanticValidator 2026-02-03 17:31:01 +00:00
9a4ac359a3 feat(semantic): add SemanticSimilarity metric 2026-02-03 17:30:56 +00:00
de5ad93524 feat(metrics): add SemanticResult type 2026-02-03 17:30:50 +00:00
cab8099d06 docs(changelog): add validator entries
Document validators module with Check protocol, metric validators,
constraint validators, composite validators, and factory functions.
2026-02-03 17:14:37 +00:00
e2be3daffd test(validators): add validator tests
Add comprehensive tests for metric validators, constraint validators,
and composite validators covering pass/fail cases and error handling.
2026-02-03 17:14:32 +00:00
9239300fd9 feat(validators): add factory functions and exports
Export all validators and provide factory functions for clean API:
bleu(), rouge(), lexical(), length(), readability(), contains(),
excludes(), all_of(), any_of().
2026-02-03 17:14:26 +00:00
b9f805b2f4 feat(validators): add composite validators
Implement AllOf and AnyOf for combining multiple checks into
composite validation rules.
2026-02-03 17:14:20 +00:00
75cd7b68de feat(validators): add constraint validators
Implement LengthValidator, ReadabilityValidator, ContainsValidator, and
ExcludesValidator for text constraints without reference text.
2026-02-03 17:14:14 +00:00
b2b5eb1518 feat(validators): add metric-based validators
Implement BleuValidator, RougeValidator, and LexicalValidator for
validating text against reference using metric thresholds.
2026-02-03 17:14:09 +00:00
9e7b0131b3 feat(validators): add Check protocol and base types
Define the Check protocol for validation checks that compute a score
and return pass/fail results with diagnostics.
2026-02-03 17:14:03 +00:00
b8ab5811dd docs(changelog): add ROUGE and readability entries 2026-02-03 17:03:39 +00:00
62fac688e4 test(metrics): add ROUGE and readability tests 2026-02-03 17:03:34 +00:00
14ac7dbbb9 feat(metrics): export ROUGE and readability from module 2026-02-03 17:03:28 +00:00
aad933f9c4 feat(metrics): add readability implementation 2026-02-03 17:03:24 +00:00
2a7476046d feat(metrics): add ROUGE implementation 2026-02-03 17:03:19 +00:00
914c738013 feat(metrics): add ROUGE and readability result types 2026-02-03 17:03:14 +00:00
a4f5fa4cc6 docs(changelog): add metrics module entries 2026-02-03 16:46:03 +00:00
027d2d3beb test(metrics): add BLEU and lexical tests
Add comprehensive tests for BLEU and lexical metrics including edge
cases, batch scoring, and aggregate statistics.
2026-02-03 16:45:57 +00:00
74ee8c2e7b feat(metrics): add lexical similarity metrics
Implement Jaccard similarity and token overlap metrics with batch
scoring support.
2026-02-03 16:45:51 +00:00
e1c8c25142 feat(metrics): add BLEU implementation
Implement BLEU-1 through BLEU-4 with modified n-gram precision,
brevity penalty, and support for multiple references.
2026-02-03 16:45:45 +00:00
e6167005e5 feat(metrics): add metric protocol and batch types
Add Metric protocol, AggregateStats for statistical summaries, and
BatchResult for batch processing support.
2026-02-03 16:45:38 +00:00
14dcddcbba chore: add gitignore and remove cached files
Add comprehensive gitignore for Python projects. Remove accidentally
committed __pycache__ directories.
2026-02-03 16:16:33 +00:00
1e3618e637 test(core): add tokenisation and types tests
Cover WordTokeniser (Unicode, empty input, punctuation, multiple scripts)
and validation types (immutability, edge cases, failure summary).
2026-02-03 16:16:20 +00:00
a65249fa44 feat(core): add config and structured logging
Implement pydantic-settings based configuration with environment variable
support and structlog integration for JSON/console output modes.
2026-02-03 16:16:13 +00:00
697b1ddfeb feat(core): add tokenisation with unicode support
Implement Tokeniser protocol and WordTokeniser class with NFC Unicode
normalisation, optional lowercasing, and punctuation removal.
2026-02-03 16:16:07 +00:00
efc6a031a3 feat(core): add validation types
Implement ValidationContext, CheckResult, and ValidationResult models
using Pydantic with frozen (immutable) configuration.
2026-02-03 16:16:01 +00:00
a1e862550c feat(core): add exception hierarchy
Implement VeritextError base class and specialised exceptions:
MetricError, ValidationError, BenchmarkError, ConfigurationError, DependencyError.
2026-02-03 16:15:55 +00:00
60aaa33327 chore(project): add pyproject.toml and project configuration
Configure Python project with pydantic, structlog, typer, rich dependencies.
Set up ruff, mypy, pytest tooling with strict type checking.
2026-02-03 16:15:48 +00:00