ilm
ILM - Infilling Language Model for XLM Framework
This package implements the ILM model with all necessary components: - Model architecture (model_ilm.py) - Loss function (loss_ilm.py) - Predictor for inference (predictor_ilm.py) - Data module (datamodule_ilm.py) - Metrics computation (metrics_ilm.py) - Type definitions (types_ilm.py) - Neural network utilities (nn.py)
This model was migrated from xlm.lm.ilm to be an external model.
RotaryTransformerILMModelWithClassification
Bases: BaseRotaryTransformerILMModel
Rotary embedding based transformer decoder.
forward(x_t, attention_mask=None, positions=None, token_type_ids=None, cls_position=None)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x_t
|
Integer[Tensor, ' *batch seq_len']
|
The input tokens of shape (*batch, seq_len) |
required |
t
|
The timesteps of shape (*batch) |
required | |
attention_mask
|
Optional[Bool[Tensor, ' *batch seq_len']]
|
The attention mask of shape (*batch, seq_len), which is True for non-padding tokens. |
None
|
positions
|
Optional[Integer[Tensor, ' *batch seq_len']]
|
The positions of the tokens of shape (*batch, seq_len) |
None
|
GPT2ILMModelWithClassification
Bases: BaseGPT2ILMModel
forward(x_t, attention_mask=None, positions=None, token_type_ids=None)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x_t
|
Integer[Tensor, ' *batch seq_len']
|
The input tokens of shape (*batch, seq_len) |
required |
t
|
The timesteps of shape (*batch) |
required | |
attention_mask
|
Optional[Bool[Tensor, ' *batch seq_len']]
|
The attention mask of shape (*batch, seq_len), which is True for non-padding tokens. |
None
|
positions
|
Optional[Integer[Tensor, ' *batch seq_len']]
|
The positions of the tokens of shape (*batch, seq_len) |
None
|
ILMPredictor
Bases: Module, ILMPredictorUtilitiesMixin, Predictor[ILMBatch, ILMPredictionDict]
__init__(max_steps, max_length, tokenizer=None, noise_schedule=None, tokens_to_suppress=None, return_history=False, sampling_method='sample', top=1000, p=0.9, second_sampling_method=None, second_top=1000, second_p=0.9, model=None, input_constraint=False)
Constructor for ILMPredictor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_steps
|
int
|
The maximum number of steps to take. |
required |
max_length
|
int
|
The maximum length (excluding special tokens like PAD and MASK) of the generated text. |
required |
stopping_threshold
|
float
|
The threshold for stopping use on the length classification scores. |
required |
tokenizer
|
Tokenizer
|
The tokenizer. Typically, set after initialization but before calling predict. |
None
|
noise_schedule
|
NoiseSchedule
|
The noise schedule. Typically, set after initialization but before calling predict. |
None
|
tokens_to_suppress
|
List[str]
|
The tokens to suppress during generation. |
None
|
return_history
|
bool
|
Whether to return the history. |
False
|
sampling_method
|
Literal['sample', 'sample_top_k', 'sample_top_p']
|
The sampling method.
When |
'sample'
|
top
|
int
|
The top-k sampling parameter for |
1000
|
p
|
float
|
The top-p sampling parameter for |
0.9
|
second_sampling_method
|
Optional[Literal['sample', 'sample_top_k', 'sample_top_p']]
|
The second sampling method. |
None
|
second_top
|
int
|
The second top-k sampling parameter for |
1000
|
second_p
|
float
|
The second top-p sampling parameter for |
0.9
|
model
|
Optional[ILMModel]
|
The model. Typically, set after initialization but before calling predict. |
None
|
ILMPredictorWithLengthClassification
Bases: Module, ILMPredictorUtilitiesMixin, Predictor[ILMBatch, ILMPredictionDict]
__init__(max_steps, max_length, stopping_threshold=0.5, tokenizer=None, noise_schedule=None, tokens_to_suppress=None, return_history=False, sampling_method='sample', top=1000, p=0.9, second_sampling_method=None, second_top=1000, second_p=0.9, model=None, force_predict_first_step=False, input_constraint=False, use_high_precision=False, stopping_temperature=1.0)
Constructor for ILMPredictor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_steps
|
int
|
The maximum number of steps to take. |
required |
max_length
|
int
|
The maximum length (excluding special tokens like PAD and MASK) of the generated text. |
required |
stopping_threshold
|
float
|
The threshold for stopping use on the length classification scores. |
0.5
|
tokenizer
|
Tokenizer
|
The tokenizer. Typically, set after initialization but before calling predict. |
None
|
noise_schedule
|
NoiseSchedule
|
The noise schedule. Typically, set after initialization but before calling predict. |
None
|
tokens_to_suppress
|
List[str]
|
The tokens to suppress during generation. |
None
|
return_history
|
bool
|
Whether to return the history. |
False
|
sampling_method
|
Literal['sample', 'sample_top_k', 'sample_top_p']
|
The sampling method.
When |
'sample'
|
top
|
int
|
The top-k sampling parameter for |
1000
|
p
|
float
|
The top-p sampling parameter for |
0.9
|
second_sampling_method
|
Optional[Literal['sample', 'sample_top_k', 'sample_top_p']]
|
The second sampling method. |
None
|
second_top
|
int
|
The second top-k sampling parameter for |
1000
|
second_p
|
float
|
The second top-p sampling parameter for |
0.9
|
model
|
Optional[ILMModel]
|
The model. Typically, set after initialization but before calling predict. |
None
|
DefaultILMCollator
Bases: Collator
Used for pre-training.
ILMSeq2SeqCollator
Drops tokens from the suffix only.
ILMSeq2SeqPredCollator
Bases: ILMSeq2SeqCollator
Drops all the suffix/target tokens and sends them in the target_ids of shape (batch_size, target_seq_len)
ILMBatch
Bases: BaseBatch
Input to the ILM.
Attributes:
| Name | Type | Description |
|---|---|---|
input_ids |
Integer[Tensor, ' batch post_seq_len']
|
The input ids to the model. |
attention_mask |
Integer[Tensor, ' batch post_seq_len']
|
1 for tokens that are not padding. |
token_type_ids |
Integer[Tensor, ' batch post_seq_len']
|
0 for CLS, 1 for BOS and prefix, 2 for other tokens. |
n_drops |
Bool[Tensor, ' batch post_seq_len']
|
1 for tokens that are dropped. |
target_ids |
Integer[Tensor, ' batch post_seq_len vocab_size']
|
The target ids to the model. |
constraint |
Optional[Bool[Tensor, ' batch post_seq_len']]
|
1 for tokens that should not be predicted. Mostly used during prediction only. |
cls_position |
Optional[Integer[Tensor, ' batch']]
|
The position of the CLS token. |
ILMSeq2SeqPredictionBatch
Bases: TypedDict
Input to the ILM for predicting suffix given the prefix. Note that the target_ids are different from the ILMBatch
Attributes:
| Name | Type | Description |
|---|---|---|
input_ids |
Integer[Tensor, ' batch prefix_seq_len']
|
The input ids to the model. |
attention_mask |
Integer[Tensor, ' batch prefix_seq_len']
|
1 for tokens that are not padding. |
token_type_ids |
Integer[Tensor, ' batch prefix_seq_len']
|
0 for CLS, 1 for BOS and prefix, 2 for other tokens. |
target_ids |
Integer[Tensor, ' batch suffix_seq_len']
|
The target ids to the model. |
ILMUncondtionalPredictionBatch
Bases: TypedDict
Input to the ILM for unconditional generation.
Attributes:
| Name | Type | Description |
|---|---|---|
input_ids |
Integer[Tensor, ' batch prefix_seq_len']
|
The input ids to the model. |
attention_mask |
Integer[Tensor, ' batch prefix_seq_len']
|
1 for tokens that are not padding. |
token_type_ids |
Integer[Tensor, ' batch prefix_seq_len']
|
0 for CLS, 1 for BOS and prefix, 2 for other tokens. |
ILMInfillPredictionBatch
Bases: TypedDict
Input to the ILM for infilling.
Attributes:
| Name | Type | Description |
|---|---|---|
input_ids |
Integer[Tensor, ' batch prefix_seq_len']
|
The input ids to the model with tokens to be infilled dropped. |
attention_mask |
Integer[Tensor, ' batch prefix_seq_len']
|
1 for tokens that are not padding. |
gap_positions |
Integer[Tensor, ' batch max_gap_positions']
|
The positions of the gaps in the input_ids which specify locations to be filled in. Padded using value -1. |
target_ids |
Integer[Tensor, ' total_gaps_in_batch max_infill_length']
|
The target ids to be filled in. One can map the targets to the exact gap using the gap_positions. |
ILMLossDict
Bases: TypedDict
Output of the LossFunction Callable.
Attributes:
| Name | Type | Description |
|---|---|---|
loss |
Float[Tensor, '']
|
The total loss value. |
batch_loss |
Float[Tensor, ' batch']
|
Loss value for each example in the batch. |
ILMPredictionDict
Bases: TypedDict
Output of the Predictor for ILM.
Attributes:
| Name | Type | Description |
|---|---|---|
loss |
Optional[Float[Tensor, batch]]
|
The loss value. Typically None. |
text |
List[str]
|
The batch of generated text without special tokens. |
text_with_spl_tokens |
List[str]
|
The batch of generated text with special tokens. |
ids |
Integer[Tensor, ' batch seq_len']
|
The batch of generated token_ids. |
attention_mask |
Bool[Tensor, ' batch seq_len']
|
Attention mask accompanying the generated ids. |
positions |
Integer[Tensor, ' batch seq_len']
|
The batch of positions of the generated tokens accompanying the ids. |
history |
List[List[Tuple[str, float, int]]]
|
The batch of history. Each entry is a list of tuples, where each tuple contains (current_string, time, step_number) of when some change is made to the generated string. |