7de4505e31
fix(pytest-plugin): remove duplicate plugin registration in tests
...
The pytest plugin is already loaded via the entry point, so explicitly
declaring it in conftest causes a duplicate registration error.
2026-02-04 00:43:20 +00:00
564d663c78
docs(changelog): update for QA fixes
2026-02-04 00:23:06 +00:00
0b2bc6c688
test(core): add coverage for config and logging modules
...
Adds tests for VeritextSettings defaults, env var overrides, and the
get_logger/configure_logging functions.
2026-02-04 00:22:57 +00:00
aa687f43cd
fix(validators): validate regex patterns at init time
...
ContainsValidator and ExcludesValidator now pre-compile regex patterns
during initialisation and raise InvalidThresholdError if invalid.
2026-02-04 00:22:47 +00:00
f18427e123
fix: QA review fixes for 0.1.0 release
...
- Fix README readability example property names
- Add validation for empty references after tokenisation in ROUGE
- Guard against zero sentence count in readability metric
- Implement LRU cache with max size for semantic embeddings
- Add .score property to LexicalResult for API consistency
- Use defensive list copy in composite validators
2026-02-03 21:31:48 +00:00
1754556c99
docs(changelog): release 0.1.0
...
Initial release with metrics, validators, pytest plugin, benchmark
module, CLI, and comprehensive documentation.
2026-02-03 19:16:37 +00:00
13c869f5d6
docs(readme): comprehensive documentation
...
Expands readme with detailed coverage of metrics, validators, pytest
plugin, benchmark module, CLI commands, and development setup.
2026-02-03 19:16:14 +00:00
93515707cc
docs(examples): add benchmark regression example
...
Demonstrates benchmark quality tracking with historical comparison and
CI integration using assert_no_regression() for exit code control.
2026-02-03 19:15:12 +00:00
3cde5aba77
docs(examples): add chatbot testing example
...
Demonstrates pytest integration for chatbot QA with validate_text()
assertions, fixtures, and parametrised content safety tests.
2026-02-03 19:14:25 +00:00
69966d171c
docs(examples): add basic validation example
...
Demonstrates core Veritext functionality: metrics, validators, composites,
and constraint validators with runnable code.
2026-02-03 19:13:47 +00:00
d5df8b52e6
docs: add branch creation instruction to git workflow
...
Explicitly documents the requirement to create a new branch before starting
work from a plan, consistent with the parent workspace CLAUDE.md instruction.
2026-02-03 19:06:45 +00:00
8b7c087de7
docs(changelog): add CLI entries
...
Document command-line interface including validate command,
benchmark subcommands, and output formatting options.
2026-02-03 18:22:50 +00:00
c54f8c3f6f
test(cli): add CLI tests
...
Add comprehensive test suite for validate command, benchmark commands,
input readers, and output formatters using Typer CliRunner.
2026-02-03 18:22:31 +00:00
0cadfd4d23
feat(cli): add benchmark subcommands
...
Add benchmark run, show, and check commands for quality tracking
with regression detection supporting CI integration.
2026-02-03 18:20:28 +00:00
e128720917
feat(cli): add validate command
...
Implement validate command with inline and file-based modes
supporting BLEU, ROUGE, and lexical metrics with multiple output formats.
2026-02-03 18:19:20 +00:00
f713d5e8a6
feat(cli): add Rich output formatters
...
Add formatters for validation results (table/json/simple) and
benchmark history display with regression report panels.
2026-02-03 18:17:33 +00:00
9853b57843
feat(cli): add JSONL and directory input readers
...
Add TextPair dataclass and read_jsonl/read_paired_jsonl functions
for parsing candidate-reference pairs from JSONL files.
2026-02-03 18:16:34 +00:00
55faae3e1b
feat(cli): add CLI entry point with version command
...
Initialise Typer app with --version flag and help text.
2026-02-03 18:16:07 +00:00
07ac70e835
docs(changelog): add benchmark entries
...
Document benchmark module features in changelog.
2026-02-03 18:10:19 +00:00
6d1bece815
test(benchmark): add benchmark module tests
...
Comprehensive tests for models, storage, regression detection, and runner.
2026-02-03 18:10:13 +00:00
40fa39485e
feat(benchmark): add module exports
...
Public API exports for the benchmark module.
2026-02-03 18:10:07 +00:00
9115f0c25b
feat(benchmark): add Benchmark runner class
...
Main Benchmark class for evaluating text quality and tracking regressions.
2026-02-03 18:10:01 +00:00
83c4b4bee5
feat(benchmark): add regression detection
...
Rolling window baseline computation and statistical regression detection.
2026-02-03 18:09:55 +00:00
44e3e8f4ea
feat(benchmark): add SQLite storage backend
...
Persistent storage for benchmark history with WAL mode for concurrent access.
2026-02-03 18:09:49 +00:00
45dfe07772
feat(benchmark): add BenchmarkRun and RegressionReport models
...
Data models for benchmark runs and regression reports using Pydantic.
2026-02-03 18:09:43 +00:00
6bafc43754
docs(changelog): add pytest plugin entries
2026-02-03 17:40:52 +00:00
012b306749
test(pytest-plugin): add plugin tests
...
Cover validate_text assertions, fixture factories, marker registration,
and pytest integration using pytester for subprocess testing.
2026-02-03 17:40:46 +00:00
ac7c5c69cf
feat(pytest-plugin): add validate_text assertion
...
Primary API for text validation in pytest with keyword arguments
for BLEU, ROUGE, semantic similarity, length, readability, and
pattern matching. Includes detailed failure formatting.
2026-02-03 17:40:40 +00:00
cd36c54e22
feat(pytest-plugin): add plugin hooks and markers
...
Register text_validation marker via pytest_configure hook.
2026-02-03 17:40:33 +00:00
107fc4e275
docs(changelog): add semantic similarity entries
2026-02-03 17:31:14 +00:00
571b770281
test(semantic): add semantic similarity tests
2026-02-03 17:31:07 +00:00
8b3536873e
feat(validators): add SemanticValidator
2026-02-03 17:31:01 +00:00
9a4ac359a3
feat(semantic): add SemanticSimilarity metric
2026-02-03 17:30:56 +00:00
de5ad93524
feat(metrics): add SemanticResult type
2026-02-03 17:30:50 +00:00
cab8099d06
docs(changelog): add validator entries
...
Document validators module with Check protocol, metric validators,
constraint validators, composite validators, and factory functions.
2026-02-03 17:14:37 +00:00
e2be3daffd
test(validators): add validator tests
...
Add comprehensive tests for metric validators, constraint validators,
and composite validators covering pass/fail cases and error handling.
2026-02-03 17:14:32 +00:00
9239300fd9
feat(validators): add factory functions and exports
...
Export all validators and provide factory functions for clean API:
bleu(), rouge(), lexical(), length(), readability(), contains(),
excludes(), all_of(), any_of().
2026-02-03 17:14:26 +00:00
b9f805b2f4
feat(validators): add composite validators
...
Implement AllOf and AnyOf for combining multiple checks into
composite validation rules.
2026-02-03 17:14:20 +00:00
75cd7b68de
feat(validators): add constraint validators
...
Implement LengthValidator, ReadabilityValidator, ContainsValidator, and
ExcludesValidator for text constraints without reference text.
2026-02-03 17:14:14 +00:00
b2b5eb1518
feat(validators): add metric-based validators
...
Implement BleuValidator, RougeValidator, and LexicalValidator for
validating text against reference using metric thresholds.
2026-02-03 17:14:09 +00:00
9e7b0131b3
feat(validators): add Check protocol and base types
...
Define the Check protocol for validation checks that compute a score
and return pass/fail results with diagnostics.
2026-02-03 17:14:03 +00:00
b8ab5811dd
docs(changelog): add ROUGE and readability entries
2026-02-03 17:03:39 +00:00
62fac688e4
test(metrics): add ROUGE and readability tests
2026-02-03 17:03:34 +00:00
14ac7dbbb9
feat(metrics): export ROUGE and readability from module
2026-02-03 17:03:28 +00:00
aad933f9c4
feat(metrics): add readability implementation
2026-02-03 17:03:24 +00:00
2a7476046d
feat(metrics): add ROUGE implementation
2026-02-03 17:03:19 +00:00
914c738013
feat(metrics): add ROUGE and readability result types
2026-02-03 17:03:14 +00:00
a4f5fa4cc6
docs(changelog): add metrics module entries
2026-02-03 16:46:03 +00:00
027d2d3beb
test(metrics): add BLEU and lexical tests
...
Add comprehensive tests for BLEU and lexical metrics including edge
cases, batch scoring, and aggregate statistics.
2026-02-03 16:45:57 +00:00
74ee8c2e7b
feat(metrics): add lexical similarity metrics
...
Implement Jaccard similarity and token overlap metrics with batch
scoring support.
2026-02-03 16:45:51 +00:00