`xlm.tasks.owt.mauve_text_eval`

MAUVE post-hoc text evaluation for xlm-core Harness / LogPredictions.

Computes MAUVE <https://arxiv.org/abs/2102.01454> between human/reference strings and model generations using mauve-text (import mauve).

Human / reference text can come from:

Each prediction row (truth, reference, …, or reference_field), including decoded target_ids when the tokenizer is passed; or
human_text_source: hf_streaming — stream strings from a HuggingFace dataset split (default: OWT validation), same idea as the standalone Proseco eval (human side from the val loader, not the JSONL).

Example Hydra defaults::

defaults:
  - your_experiment
  - /post_hoc_evaluator@post_hoc_evaluator.evaluators.prediction.mauve: mauve_text

Or instantiate explicitly (no composite)::

post_hoc_evaluator:
  _target_: xlm.tasks.owt.mauve_text_eval.MauveTextEval

Install::

pip install "xlm-core[mauve]"

(or pip install mauve-text).

.. _mauve-text: https://pypi.org/project/mauve-text/

`MauveTextEval`

Post-hoc evaluator: MAUVE between references and model text.

Parameters:

Name	Type	Description	Default
`reference_field`	`Optional[str]`	Batch / prediction key for human text. If `None`, the first non-empty among `truth`, `reference`, `target_text`, `ground_truth_middle`, etc., or decoded `target_ids` when `tokenizer` is passed.	`None`
`generated_field`	`str`	Key for model output (default `text`).	`'text'`
`featurize_model_name`	`str`	HF model name for MAUVE features (see `mauve-text`).	`'gpt2-large'`
`device_id`	`int`	GPU id for featurization, or `-1` for CPU.	`0`
`max_text_length`	`int`	Max tokens per string for the featurizer.	`256`
`batch_size`	`int`	Featurization batch size.	`8`
`verbose`	`bool`	Forwarded to `mauve.compute_mauve`.	`False`
`num_buckets`	`Any`	Histogram size (`"auto"` or int).	`'auto'`
`seed`	`int`	RNG seed for k-means.	`25`
`swap_p_q`	`bool`	If `True`, treat generations as `p_text` and references as `q_text` (library default is human `p`, machine `q`).	`False`
`human_text_source`	`Optional[str]`	If `"hf_streaming"`, build `p_text` from a HF dataset split instead of per-row references (Proseco-style). If `None`, use `truth` / `reference_field` / etc. on each row.	`None`
`hf_dataset_path`	`str`	Dataset id for streaming (OWT default).	`'dhruveshpatel/owt-gpt2-1024-split'`
`hf_split`	`str`	Split name, e.g. `validation`.	`'validation'`
`hf_text_column`	`str`	Column with raw text.	`'text'`
`hf_shuffle_seed`	`int`	Seed for streaming shuffle.	`42`
`hf_shuffle_buffer_size`	`int`	Shuffle buffer for streaming.	`10000`
`hf_min_chars`	`int`	Skip shorter snippets.	`8`