Skip to content

xlm.tasks.composite_eval

Composite post-hoc evaluator that routes to task-specific evaluators.

Usage in Hydra config::

post_hoc_evaluator:
  _target_: xlm.tasks.composite_eval.CompositePostHocEvaluator
  evaluators:
    math500_prediction:
      _target_: xlm.tasks.math500.Math500Eval
    denovo_prediction:
      _target_: xlm.tasks.molgen.DeNovoEval
      use_bracket_safe: true

CompositePostHocEvaluator

Routes eval() calls to a task-specific evaluator chosen by dataloader name.

The evaluators dict maps a pattern (substring) to an evaluator instance. When eval() is called with a dataloader_name, the first evaluator whose key is a substring of the name is selected. If nothing matches, the predictions are returned unchanged with empty metrics.

This is a drop-in replacement for a single evaluator: the existing Harness.compute_post_hoc_metrics passes dataloader_name through, and evaluators that don't use it simply ignore the kwarg.

Parameters:

Name Type Description Default
evaluators Dict[str, Any]

Mapping from dataloader-name substring to evaluator. Each evaluator must implement eval(predictions, tokenizer=..., **kwargs).

required