xlm.tasks.owt.mauve_text_eval
MAUVE post-hoc text evaluation for xlm-core Harness / LogPredictions.
Computes MAUVE <https://arxiv.org/abs/2102.01454> between human/reference
strings and model generations using mauve-text (import mauve).
Human / reference text can come from:
- Each prediction row (
truth,reference, …, orreference_field), including decodedtarget_idswhen the tokenizer is passed; or human_text_source: hf_streaming— stream strings from a HuggingFace dataset split (default: OWT validation), same idea as the standalone Proseco eval (human side from the val loader, not the JSONL).
Example Hydra defaults::
defaults:
- your_experiment
- /post_hoc_evaluator@post_hoc_evaluator.evaluators.prediction.mauve: mauve_text
Or instantiate explicitly (no composite)::
post_hoc_evaluator:
_target_: xlm.tasks.owt.mauve_text_eval.MauveTextEval
Install::
pip install "xlm-core[mauve]"
(or pip install mauve-text).
.. _mauve-text: https://pypi.org/project/mauve-text/
MauveTextEval
Post-hoc evaluator: MAUVE between references and model text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_field
|
Optional[str]
|
Batch / prediction key for human text. If |
None
|
generated_field
|
str
|
Key for model output (default |
'text'
|
featurize_model_name
|
str
|
HF model name for MAUVE features (see |
'gpt2-large'
|
device_id
|
int
|
GPU id for featurization, or |
0
|
max_text_length
|
int
|
Max tokens per string for the featurizer. |
256
|
batch_size
|
int
|
Featurization batch size. |
8
|
verbose
|
bool
|
Forwarded to |
False
|
num_buckets
|
Any
|
Histogram size ( |
'auto'
|
seed
|
int
|
RNG seed for k-means. |
25
|
swap_p_q
|
bool
|
If |
False
|
human_text_source
|
Optional[str]
|
If |
None
|
hf_dataset_path
|
str
|
Dataset id for streaming (OWT default). |
'dhruveshpatel/owt-gpt2-1024-split'
|
hf_split
|
str
|
Split name, e.g. |
'validation'
|
hf_text_column
|
str
|
Column with raw text. |
'text'
|
hf_shuffle_seed
|
int
|
Seed for streaming shuffle. |
42
|
hf_shuffle_buffer_size
|
int
|
Shuffle buffer for streaming. |
10000
|
hf_min_chars
|
int
|
Skip shorter snippets. |
8
|