`ilm`

ILM - Infilling Language Model for XLM Framework

This package implements the ILM model with all necessary components: - Model architecture (model_ilm.py) - Loss function (loss_ilm.py) - Predictor for inference (predictor_ilm.py) - Data module (datamodule_ilm.py) - Metrics computation (metrics_ilm.py) - Type definitions (types_ilm.py) - Neural network utilities (nn.py)

This model was migrated from xlm.lm.ilm to be an external model.

`RotaryTransformerILMModelWithClassification`

Bases: BaseRotaryTransformerILMModel

Rotary embedding based transformer decoder.

`forward(x_t, attention_mask=None, positions=None, token_type_ids=None, cls_position=None)`

Parameters:

Name	Type	Description	Default
`x_t`	`Integer[Tensor, ' *batch seq_len']`	The input tokens of shape (*batch, seq_len)	required
`t`		The timesteps of shape (*batch)	required
`attention_mask`	`Optional[Bool[Tensor, ' *batch seq_len']]`	The attention mask of shape (*batch, seq_len), which is True for non-padding tokens.	`None`
`positions`	`Optional[Integer[Tensor, ' *batch seq_len']]`	The positions of the tokens of shape (*batch, seq_len)	`None`

`GPT2ILMModelWithClassification`

Bases: BaseGPT2ILMModel

`forward(x_t, attention_mask=None, positions=None, token_type_ids=None)`

Parameters:

Name	Type	Description	Default
`x_t`	`Integer[Tensor, ' *batch seq_len']`	The input tokens of shape (*batch, seq_len)	required
`t`		The timesteps of shape (*batch)	required
`attention_mask`	`Optional[Bool[Tensor, ' *batch seq_len']]`	The attention mask of shape (*batch, seq_len), which is True for non-padding tokens.	`None`
`positions`	`Optional[Integer[Tensor, ' *batch seq_len']]`	The positions of the tokens of shape (*batch, seq_len)	`None`

`ILMPredictor`

Bases: Module, ILMPredictorUtilitiesMixin, Predictor[ILMBatch, ILMPredictionDict]

`init(max_steps, max_length, tokenizer=None, noise_schedule=None, tokens_to_suppress=None, return_history=False, sampling_method='sample', top=1000, p=0.9, second_sampling_method=None, second_top=1000, second_p=0.9, model=None, input_constraint=False)`

Constructor for ILMPredictor.

Parameters:

Name	Type	Description	Default
`max_steps`	`int`	The maximum number of steps to take.	required
`max_length`	`int`	The maximum length (excluding special tokens like PAD and MASK) of the generated text.	required
`stopping_threshold`	`float`	The threshold for stopping use on the length classification scores.	required
`tokenizer`	`Tokenizer`	The tokenizer. Typically, set after initialization but before calling predict.	`None`
`noise_schedule`	`NoiseSchedule`	The noise schedule. Typically, set after initialization but before calling predict.	`None`
`tokens_to_suppress`	`List[str]`	The tokens to suppress during generation.	`None`
`return_history`	`bool`	Whether to return the history.	`False`
`sampling_method`	`Literal['sample', 'sample_top_k', 'sample_top_p']`	The sampling method. When `second_sampling_method` is not provided, the specified method here is used to sample from the joint distribution of positions and tokens. When `second_sampling_method` is provided, the specified method here is used to sample from the token distribution (conditional) given the postions sampled using the `second_sampling_method`. "sample" means vanilla sampling from the distribution. "sample_top_k" means sampling from the top-k distribution. "sample_top_p" means sampling from the top-p distribution (neuclius samplingn).	`'sample'`
`top`	`int`	The top-k sampling parameter for `sampling_method`.	`1000`
`p`	`float`	The top-p sampling parameter for `sampling_method`.	`0.9`
`second_sampling_method`	`Optional[Literal['sample', 'sample_top_k', 'sample_top_p']]`	The second sampling method.	`None`
`second_top`	`int`	The second top-k sampling parameter for `second_sampling_method`.	`1000`
`second_p`	`float`	The second top-p sampling parameter for `second_sampling_method`.	`0.9`
`model`	`Optional[ILMModel]`	The model. Typically, set after initialization but before calling predict.	`None`

`ILMPredictorWithLengthClassification`

Bases: Module, ILMPredictorUtilitiesMixin, Predictor[ILMBatch, ILMPredictionDict]

`init(max_steps, max_length, stopping_threshold=0.5, tokenizer=None, noise_schedule=None, tokens_to_suppress=None, return_history=False, sampling_method='sample', top=1000, p=0.9, second_sampling_method=None, second_top=1000, second_p=0.9, model=None, force_predict_first_step=False, input_constraint=False, use_high_precision=False, stopping_temperature=1.0)`

Constructor for ILMPredictor.

Parameters:

Name	Type	Description	Default
`max_steps`	`int`	The maximum number of steps to take.	required
`max_length`	`int`	The maximum length (excluding special tokens like PAD and MASK) of the generated text.	required
`stopping_threshold`	`float`	The threshold for stopping use on the length classification scores.	`0.5`
`tokenizer`	`Tokenizer`	The tokenizer. Typically, set after initialization but before calling predict.	`None`
`noise_schedule`	`NoiseSchedule`	The noise schedule. Typically, set after initialization but before calling predict.	`None`
`tokens_to_suppress`	`List[str]`	The tokens to suppress during generation.	`None`
`return_history`	`bool`	Whether to return the history.	`False`
`sampling_method`	`Literal['sample', 'sample_top_k', 'sample_top_p']`	The sampling method. When `second_sampling_method` is not provided, the specified method here is used to sample from the joint distribution of positions and tokens. When `second_sampling_method` is provided, the specified method here is used to sample from the token distribution (conditional) given the postions sampled using the `second_sampling_method`. "sample" means vanilla sampling from the distribution. "sample_top_k" means sampling from the top-k distribution. "sample_top_p" means sampling from the top-p distribution (neuclius samplingn).	`'sample'`
`top`	`int`	The top-k sampling parameter for `sampling_method`.	`1000`
`p`	`float`	The top-p sampling parameter for `sampling_method`.	`0.9`
`second_sampling_method`	`Optional[Literal['sample', 'sample_top_k', 'sample_top_p']]`	The second sampling method.	`None`
`second_top`	`int`	The second top-k sampling parameter for `second_sampling_method`.	`1000`
`second_p`	`float`	The second top-p sampling parameter for `second_sampling_method`.	`0.9`
`model`	`Optional[ILMModel]`	The model. Typically, set after initialization but before calling predict.	`None`

`DefaultILMCollator`

Bases: Collator

Used for pre-training.

`ILMSeq2SeqCollator`

Drops tokens from the suffix only.

`ILMSeq2SeqPredCollator`

Bases: ILMSeq2SeqCollator

Drops all the suffix/target tokens and sends them in the target_ids of shape (batch_size, target_seq_len)

`ILMBatch`

Bases: BaseBatch

Input to the ILM.

Attributes:

Name	Type	Description
`input_ids`	`Integer[Tensor, ' batch post_seq_len']`	The input ids to the model.
`attention_mask`	`Integer[Tensor, ' batch post_seq_len']`	1 for tokens that are not padding.
`token_type_ids`	`Integer[Tensor, ' batch post_seq_len']`	0 for CLS, 1 for BOS and prefix, 2 for other tokens.
`n_drops`	`Bool[Tensor, ' batch post_seq_len']`	1 for tokens that are dropped.
`target_ids`	`Integer[Tensor, ' batch post_seq_len vocab_size']`	The target ids to the model.
`constraint`	`Optional[Bool[Tensor, ' batch post_seq_len']]`	1 for tokens that should not be predicted. Mostly used during prediction only.
`cls_position`	`Optional[Integer[Tensor, ' batch']]`	The position of the CLS token.

`ILMSeq2SeqPredictionBatch`

Bases: TypedDict

Input to the ILM for predicting suffix given the prefix. Note that the target_ids are different from the ILMBatch

Attributes:

Name	Type	Description
`input_ids`	`Integer[Tensor, ' batch prefix_seq_len']`	The input ids to the model.
`attention_mask`	`Integer[Tensor, ' batch prefix_seq_len']`	1 for tokens that are not padding.
`token_type_ids`	`Integer[Tensor, ' batch prefix_seq_len']`	0 for CLS, 1 for BOS and prefix, 2 for other tokens.
`target_ids`	`Integer[Tensor, ' batch suffix_seq_len']`	The target ids to the model.

`ILMUncondtionalPredictionBatch`

Bases: TypedDict

Input to the ILM for unconditional generation.

Attributes:

Name	Type	Description
`input_ids`	`Integer[Tensor, ' batch prefix_seq_len']`	The input ids to the model.
`attention_mask`	`Integer[Tensor, ' batch prefix_seq_len']`	1 for tokens that are not padding.
`token_type_ids`	`Integer[Tensor, ' batch prefix_seq_len']`	0 for CLS, 1 for BOS and prefix, 2 for other tokens.

`ILMInfillPredictionBatch`

Bases: TypedDict

Input to the ILM for infilling.

Attributes:

Name	Type	Description
`input_ids`	`Integer[Tensor, ' batch prefix_seq_len']`	The input ids to the model with tokens to be infilled dropped.
`attention_mask`	`Integer[Tensor, ' batch prefix_seq_len']`	1 for tokens that are not padding.
`gap_positions`	`Integer[Tensor, ' batch max_gap_positions']`	The positions of the gaps in the input_ids which specify locations to be filled in. Padded using value -1.
`target_ids`	`Integer[Tensor, ' total_gaps_in_batch max_infill_length']`	The target ids to be filled in. One can map the targets to the exact gap using the gap_positions.

`ILMLossDict`

Bases: TypedDict

Output of the LossFunction Callable.

Attributes:

Name	Type	Description
`loss`	`Float[Tensor, '']`	The total loss value.
`batch_loss`	`Float[Tensor, ' batch']`	Loss value for each example in the batch.

`ILMPredictionDict`

Bases: TypedDict

Output of the Predictor for ILM.

Attributes:

Name	Type	Description
`loss`	`Optional[Float[Tensor, batch]]`	The loss value. Typically None.
`text`	`List[str]`	The batch of generated text without special tokens.
`text_with_spl_tokens`	`List[str]`	The batch of generated text with special tokens.
`ids`	`Integer[Tensor, ' batch seq_len']`	The batch of generated token_ids.
`attention_mask`	`Bool[Tensor, ' batch seq_len']`	Attention mask accompanying the generated ids.
`positions`	`Integer[Tensor, ' batch seq_len']`	The batch of positions of the generated tokens accompanying the ids.
`history`	`List[List[Tuple[str, float, int]]]`	The batch of history. Each entry is a list of tuples, where each tuple contains (current_string, time, step_number) of when some change is made to the generated string.

ilm

RotaryTransformerILMModelWithClassification

forward(x_t, attention_mask=None, positions=None, token_type_ids=None, cls_position=None)

GPT2ILMModelWithClassification

forward(x_t, attention_mask=None, positions=None, token_type_ids=None)

ILMPredictor

__init__(max_steps, max_length, tokenizer=None, noise_schedule=None, tokens_to_suppress=None, return_history=False, sampling_method='sample', top=1000, p=0.9, second_sampling_method=None, second_top=1000, second_p=0.9, model=None, input_constraint=False)

ILMPredictorWithLengthClassification

DefaultILMCollator

ILMSeq2SeqCollator

ILMSeq2SeqPredCollator

ILMBatch

ILMSeq2SeqPredictionBatch

ILMUncondtionalPredictionBatch

ILMInfillPredictionBatch

ILMLossDict

ILMPredictionDict

`ilm`

`RotaryTransformerILMModelWithClassification`

`forward(x_t, attention_mask=None, positions=None, token_type_ids=None, cls_position=None)`

`GPT2ILMModelWithClassification`

`forward(x_t, attention_mask=None, positions=None, token_type_ids=None)`

`ILMPredictor`

`init(max_steps, max_length, tokenizer=None, noise_schedule=None, tokens_to_suppress=None, return_history=False, sampling_method='sample', top=1000, p=0.9, second_sampling_method=None, second_top=1000, second_p=0.9, model=None, input_constraint=False)`

`ILMPredictorWithLengthClassification`

`DefaultILMCollator`

`ILMSeq2SeqCollator`

`ILMSeq2SeqPredCollator`

`ILMBatch`

`ILMSeq2SeqPredictionBatch`

`ILMUncondtionalPredictionBatch`

`ILMInfillPredictionBatch`

`ILMLossDict`

`ILMPredictionDict`