refactor: CLI cleanup and documentation updates
- Refactor CLI metric computation to eliminate code duplication - Update version format to PEP 440 compliance (0.1.0.dev0) - Cache Settings instance via @lru_cache for performance - Document composite validators' protocol deviation - Consolidate redundant empty checks in ROUGE-L computation - Add Phase 10 (Portfolio Demos) to implementation plan
This commit is contained in:
@@ -488,3 +488,47 @@ benchmark.assert_no_regression(tolerance=0.03)
|
||||
|
||||
5. **Natural portfolio narrative** — "I was building X and needed a better way to test
|
||||
it, so I built this tool." Every interviewer has faced similar problems.
|
||||
|
||||
---
|
||||
|
||||
## Portfolio Demos (Future)
|
||||
|
||||
Interactive demos to showcase Veritext without requiring installation.
|
||||
|
||||
### Streamlit Demo
|
||||
|
||||
A quick interactive web UI for general visitors and recruiters.
|
||||
|
||||
**Features:**
|
||||
- Text input boxes (candidate + reference)
|
||||
- Metric selector (BLEU, ROUGE, lexical, readability)
|
||||
- Threshold sliders for pass/fail validation
|
||||
- Results table with scores and status
|
||||
|
||||
**Deployment:** Self-hosted on homeserver (e.g., `veritext.kschappell.com`)
|
||||
|
||||
**Effort:** ~30 minutes
|
||||
|
||||
### Jupyter Notebook Collection
|
||||
|
||||
Deep-dive notebooks targeting data science and ML recruiters.
|
||||
|
||||
**Notebooks:**
|
||||
|
||||
| Notebook | Purpose |
|
||||
|----------|---------|
|
||||
| `01-metrics-overview.ipynb` | Introduction to each metric with visualisations |
|
||||
| `02-batch-evaluation.ipynb` | Evaluating model outputs at scale, statistical analysis |
|
||||
| `03-regression-detection.ipynb` | Tracking quality over time, detecting degradation |
|
||||
| `04-chatbot-validation.ipynb` | Real-world use case: validating chatbot responses |
|
||||
|
||||
**Hosting:** JupyterLite (static files, runs in browser via WebAssembly)
|
||||
|
||||
**Deployment:** Self-hosted alongside Streamlit demo
|
||||
|
||||
**Why both:**
|
||||
|
||||
| Demo Type | Audience | Value |
|
||||
|-----------|----------|-------|
|
||||
| Streamlit | General visitors | Quick, interactive, no friction |
|
||||
| Notebooks | Data/ML recruiters | Shows analytical depth, speaks their language |
|
||||
|
||||
Reference in New Issue
Block a user