Skip to content

ilm

ILM - Infilling Language Model for XLM Framework

This package implements the ILM model with all necessary components: - Model architecture (model_ilm.py) - Loss function (loss_ilm.py) - Predictor for inference (predictor_ilm.py) - Data module (datamodule_ilm.py) - Metrics computation (metrics_ilm.py) - Type definitions (types_ilm.py) - Neural network utilities (nn.py)

This model was migrated from xlm.lm.ilm to be an external model.

RotaryTransformerILMModelWithClassification

Bases: BaseRotaryTransformerILMModel

Rotary embedding based transformer decoder.

forward(x_t, attention_mask=None, positions=None, token_type_ids=None, cls_position=None)

Parameters:

Name Type Description Default
x_t Integer[Tensor, ' *batch seq_len']

The input tokens of shape (*batch, seq_len)

required
t

The timesteps of shape (*batch)

required
attention_mask Optional[Bool[Tensor, ' *batch seq_len']]

The attention mask of shape (*batch, seq_len), which is True for non-padding tokens.

None
positions Optional[Integer[Tensor, ' *batch seq_len']]

The positions of the tokens of shape (*batch, seq_len)

None

GPT2ILMModelWithClassification

Bases: BaseGPT2ILMModel

forward(x_t, attention_mask=None, positions=None, token_type_ids=None)

Parameters:

Name Type Description Default
x_t Integer[Tensor, ' *batch seq_len']

The input tokens of shape (*batch, seq_len)

required
t

The timesteps of shape (*batch)

required
attention_mask Optional[Bool[Tensor, ' *batch seq_len']]

The attention mask of shape (*batch, seq_len), which is True for non-padding tokens.

None
positions Optional[Integer[Tensor, ' *batch seq_len']]

The positions of the tokens of shape (*batch, seq_len)

None

ILMPredictor

Bases: Module, ILMPredictorUtilitiesMixin, Predictor[ILMBatch, ILMPredictionDict]

__init__(max_steps, max_length, tokenizer=None, noise_schedule=None, tokens_to_suppress=None, return_history=False, sampling_method='sample', top=1000, p=0.9, second_sampling_method=None, second_top=1000, second_p=0.9, model=None, input_constraint=False)

Constructor for ILMPredictor.

Parameters:

Name Type Description Default
max_steps int

The maximum number of steps to take.

required
max_length int

The maximum length (excluding special tokens like PAD and MASK) of the generated text.

required
stopping_threshold float

The threshold for stopping use on the length classification scores.

required
tokenizer Tokenizer

The tokenizer. Typically, set after initialization but before calling predict.

None
noise_schedule NoiseSchedule

The noise schedule. Typically, set after initialization but before calling predict.

None
tokens_to_suppress List[str]

The tokens to suppress during generation.

None
return_history bool

Whether to return the history.

False
sampling_method Literal['sample', 'sample_top_k', 'sample_top_p']

The sampling method. When second_sampling_method is not provided, the specified method here is used to sample from the joint distribution of positions and tokens. When second_sampling_method is provided, the specified method here is used to sample from the token distribution (conditional) given the postions sampled using the second_sampling_method. "sample" means vanilla sampling from the distribution. "sample_top_k" means sampling from the top-k distribution. "sample_top_p" means sampling from the top-p distribution (neuclius samplingn).

'sample'
top int

The top-k sampling parameter for sampling_method.

1000
p float

The top-p sampling parameter for sampling_method.

0.9
second_sampling_method Optional[Literal['sample', 'sample_top_k', 'sample_top_p']]

The second sampling method.

None
second_top int

The second top-k sampling parameter for second_sampling_method.

1000
second_p float

The second top-p sampling parameter for second_sampling_method.

0.9
model Optional[ILMModel]

The model. Typically, set after initialization but before calling predict.

None

ILMPredictorWithLengthClassification

Bases: Module, ILMPredictorUtilitiesMixin, Predictor[ILMBatch, ILMPredictionDict]

__init__(max_steps, max_length, stopping_threshold=0.5, tokenizer=None, noise_schedule=None, tokens_to_suppress=None, return_history=False, sampling_method='sample', top=1000, p=0.9, second_sampling_method=None, second_top=1000, second_p=0.9, model=None, force_predict_first_step=False, input_constraint=False, use_high_precision=False, stopping_temperature=1.0)

Constructor for ILMPredictor.

Parameters:

Name Type Description Default
max_steps int

The maximum number of steps to take.

required
max_length int

The maximum length (excluding special tokens like PAD and MASK) of the generated text.

required
stopping_threshold float

The threshold for stopping use on the length classification scores.

0.5
tokenizer Tokenizer

The tokenizer. Typically, set after initialization but before calling predict.

None
noise_schedule NoiseSchedule

The noise schedule. Typically, set after initialization but before calling predict.

None
tokens_to_suppress List[str]

The tokens to suppress during generation.

None
return_history bool

Whether to return the history.

False
sampling_method Literal['sample', 'sample_top_k', 'sample_top_p']

The sampling method. When second_sampling_method is not provided, the specified method here is used to sample from the joint distribution of positions and tokens. When second_sampling_method is provided, the specified method here is used to sample from the token distribution (conditional) given the postions sampled using the second_sampling_method. "sample" means vanilla sampling from the distribution. "sample_top_k" means sampling from the top-k distribution. "sample_top_p" means sampling from the top-p distribution (neuclius samplingn).

'sample'
top int

The top-k sampling parameter for sampling_method.

1000
p float

The top-p sampling parameter for sampling_method.

0.9
second_sampling_method Optional[Literal['sample', 'sample_top_k', 'sample_top_p']]

The second sampling method.

None
second_top int

The second top-k sampling parameter for second_sampling_method.

1000
second_p float

The second top-p sampling parameter for second_sampling_method.

0.9
model Optional[ILMModel]

The model. Typically, set after initialization but before calling predict.

None

DefaultILMCollator

Bases: Collator

Used for pre-training.

ILMSeq2SeqCollator

Drops tokens from the suffix only.

ILMSeq2SeqPredCollator

Bases: ILMSeq2SeqCollator

Drops all the suffix/target tokens and sends them in the target_ids of shape (batch_size, target_seq_len)

ILMBatch

Bases: BaseBatch

Input to the ILM.

Attributes:

Name Type Description
input_ids Integer[Tensor, ' batch post_seq_len']

The input ids to the model.

attention_mask Integer[Tensor, ' batch post_seq_len']

1 for tokens that are not padding.

token_type_ids Integer[Tensor, ' batch post_seq_len']

0 for CLS, 1 for BOS and prefix, 2 for other tokens.

n_drops Bool[Tensor, ' batch post_seq_len']

1 for tokens that are dropped.

target_ids Integer[Tensor, ' batch post_seq_len vocab_size']

The target ids to the model.

constraint Optional[Bool[Tensor, ' batch post_seq_len']]

1 for tokens that should not be predicted. Mostly used during prediction only.

cls_position Optional[Integer[Tensor, ' batch']]

The position of the CLS token.

ILMSeq2SeqPredictionBatch

Bases: TypedDict

Input to the ILM for predicting suffix given the prefix. Note that the target_ids are different from the ILMBatch

Attributes:

Name Type Description
input_ids Integer[Tensor, ' batch prefix_seq_len']

The input ids to the model.

attention_mask Integer[Tensor, ' batch prefix_seq_len']

1 for tokens that are not padding.

token_type_ids Integer[Tensor, ' batch prefix_seq_len']

0 for CLS, 1 for BOS and prefix, 2 for other tokens.

target_ids Integer[Tensor, ' batch suffix_seq_len']

The target ids to the model.

ILMUncondtionalPredictionBatch

Bases: TypedDict

Input to the ILM for unconditional generation.

Attributes:

Name Type Description
input_ids Integer[Tensor, ' batch prefix_seq_len']

The input ids to the model.

attention_mask Integer[Tensor, ' batch prefix_seq_len']

1 for tokens that are not padding.

token_type_ids Integer[Tensor, ' batch prefix_seq_len']

0 for CLS, 1 for BOS and prefix, 2 for other tokens.

ILMInfillPredictionBatch

Bases: TypedDict

Input to the ILM for infilling.

Attributes:

Name Type Description
input_ids Integer[Tensor, ' batch prefix_seq_len']

The input ids to the model with tokens to be infilled dropped.

attention_mask Integer[Tensor, ' batch prefix_seq_len']

1 for tokens that are not padding.

gap_positions Integer[Tensor, ' batch max_gap_positions']

The positions of the gaps in the input_ids which specify locations to be filled in. Padded using value -1.

target_ids Integer[Tensor, ' total_gaps_in_batch max_infill_length']

The target ids to be filled in. One can map the targets to the exact gap using the gap_positions.

ILMLossDict

Bases: TypedDict

Output of the LossFunction Callable.

Attributes:

Name Type Description
loss Float[Tensor, '']

The total loss value.

batch_loss Float[Tensor, ' batch']

Loss value for each example in the batch.

ILMPredictionDict

Bases: TypedDict

Output of the Predictor for ILM.

Attributes:

Name Type Description
loss Optional[Float[Tensor, batch]]

The loss value. Typically None.

text List[str]

The batch of generated text without special tokens.

text_with_spl_tokens List[str]

The batch of generated text with special tokens.

ids Integer[Tensor, ' batch seq_len']

The batch of generated token_ids.

attention_mask Bool[Tensor, ' batch seq_len']

Attention mask accompanying the generated ids.

positions Integer[Tensor, ' batch seq_len']

The batch of positions of the generated tokens accompanying the ids.

history List[List[Tuple[str, float, int]]]

The batch of history. Each entry is a list of tuples, where each tuple contains (current_string, time, step_number) of when some change is made to the generated string.