ffa8658189
cli skeleton + version command
...
Initialise Typer app with --version flag and help text.
2025-05-07 20:02:07 +00:00
434f40f69b
changelog: benchmark module
...
Document benchmark module features in changelog.
2025-04-20 15:52:08 +00:00
9afa499af3
benchmark tests
...
Comprehensive tests for models, storage, regression detection, and runner.
2025-04-20 15:04:33 +00:00
127eb9cac6
wire up benchmark exports
...
Public API exports for the benchmark module.
2025-04-19 12:14:37 +00:00
73a65656a1
benchmark runner
...
Main Benchmark class for evaluating text quality and tracking regressions.
2025-04-19 12:08:56 +00:00
32fec2b6d5
regression detection logic
...
Rolling window baseline computation and statistical regression detection.
2025-04-19 11:13:01 +00:00
e6f55c2781
sqlite storage for benchmarks
...
Persistent storage for benchmark history with WAL mode for concurrent access.
2025-04-15 19:06:55 +00:00
aad6c3497d
benchmark data models
...
Data models for benchmark runs and regression reports using Pydantic.
2025-04-09 20:42:03 +00:00
4b0a05d00d
changelog: pytest plugin
2025-04-09 20:10:02 +00:00
4d103cbe52
test pytest plugin
...
Cover validate_text assertions, fixture factories, marker registration,
and pytest integration using pytester for subprocess testing.
2025-04-06 15:50:41 +00:00
9f9a3da6cc
validate_text assertion helper
...
Primary API for text validation in pytest with keyword arguments
for BLEU, ROUGE, semantic similarity, length, readability, and
pattern matching. Includes detailed failure formatting.
2025-04-06 15:13:34 +00:00
7528c44dcc
pytest plugin: hooks and markers
...
Register text_validation marker via pytest_configure hook.
2025-04-06 14:03:06 +00:00
4d9a2f3e0c
changelog: semantic similarity
2025-04-05 12:35:31 +00:00
06124c12ae
test semantic similarity
2025-04-05 11:51:16 +00:00
0dffaa4817
semantic validator
2025-04-05 11:09:54 +00:00
b6c4bad96a
feat: semantic similarity metric
2025-04-05 10:03:52 +00:00
40674929b9
semantic result type
2025-04-03 19:58:08 +00:00
7c508a592f
changelog: validators
...
Document validators module with Check protocol, metric validators,
constraint validators, composite validators, and factory functions.
2025-04-03 19:05:21 +00:00
7f23586406
validator tests
...
Add comprehensive tests for metric validators, constraint validators,
and composite validators covering pass/fail cases and error handling.
2025-03-29 13:30:23 +00:00
8fd1dc4cd3
validator factory functions
...
Export all validators and provide factory functions for clean API:
bleu(), rouge(), lexical(), length(), readability(), contains(),
excludes(), all_of(), any_of().
2025-03-29 13:12:53 +00:00
62d78ab699
composite validators (all/any/pipeline)
...
Implement AllOf and AnyOf for combining multiple checks into
composite validation rules.
2025-03-26 18:54:51 +00:00
067cd74566
constraint validators (length, regex, contains)
...
Implement LengthValidator, ReadabilityValidator, ContainsValidator, and
ExcludesValidator for text constraints without reference text.
2025-03-26 18:06:03 +00:00
3ef262d357
metric validators (threshold checks)
...
Implement BleuValidator, RougeValidator, and LexicalValidator for
validating text against reference using metric thresholds.
2025-03-22 11:59:22 +00:00
d17d3de06d
validator protocol + base types
...
Define the Check protocol for validation checks that compute a score
and return pass/fail results with diagnostics.
2025-03-22 10:48:29 +00:00
9f53446ca7
changelog: ROUGE and readability
2025-03-22 10:04:35 +00:00
ec48eb5bf5
tests for ROUGE and readability
2025-03-20 20:32:41 +00:00
5c2d626208
export ROUGE + readability
2025-03-20 20:13:05 +00:00
2ef8265754
readability metrics (flesch, gunning fog, etc)
2025-03-16 16:03:32 +00:00
0032f89b17
feat: ROUGE-L scorer
2025-03-16 15:50:45 +00:00
d2a9f28335
ROUGE and readability result types
2025-03-16 15:00:13 +00:00
856bcbccbb
changelog: metrics module
2025-03-15 13:30:08 +00:00
afb39cf177
test BLEU and lexical metrics
...
Add comprehensive tests for BLEU and lexical metrics including edge
cases, batch scoring, and aggregate statistics.
2025-03-15 12:14:45 +00:00
f26e14bf20
lexical similarity (jaccard, overlap, cosine)
...
Implement Jaccard similarity and token overlap metrics with batch
scoring support.
2025-03-15 12:09:50 +00:00
82b6ffea79
feat: BLEU scorer
...
Implement BLEU-1 through BLEU-4 with modified n-gram precision,
brevity penalty, and support for multiple references.
2025-03-15 11:08:36 +00:00
7832fa3d59
metric protocol and batch scoring types
...
Add Metric protocol, AggregateStats for statistical summaries, and
BatchResult for batch processing support.
2025-03-12 20:13:11 +00:00
c53cdd2536
gitignore, clean cached files
...
Add comprehensive gitignore for Python projects. Remove accidentally
committed __pycache__ directories.
2025-03-12 19:13:31 +00:00
2827dcdf4e
tests for tokeniser and types
...
Cover WordTokeniser (Unicode, empty input, punctuation, multiple scripts)
and validation types (immutability, edge cases, failure summary).
2025-03-09 11:42:26 +00:00
1fb9e1f835
config + structured logging
...
Implement pydantic-settings based configuration with environment variable
support and structlog integration for JSON/console output modes.
2025-03-09 10:32:16 +00:00
3e88705404
tokeniser with unicode handling
...
Implement Tokeniser protocol and WordTokeniser class with NFC Unicode
normalisation, optional lowercasing, and punctuation removal.
2025-03-09 10:06:28 +00:00
494f5d0c85
add validation result types
...
Implement ValidationContext, CheckResult, and ValidationResult models
using Pydantic with frozen (immutable) configuration.
2025-03-08 15:30:31 +00:00
d20ea7c4ff
core exceptions
...
Implement VeritextError base class and specialised exceptions:
MetricError, ValidationError, BenchmarkError, ConfigurationError, DependencyError.
2025-03-08 15:00:29 +00:00
bf5884cb27
project setup: pyproject.toml, deps, tooling
...
Configure Python project with pydantic, structlog, typer, rich dependencies.
Set up ruff, mypy, pytest tooling with strict type checking.
2025-03-08 14:03:32 +00:00