Skip to content

ilm.model_ilm

BaseRotaryTransformerILMModel

Bases: Module, Model

Rotary embedding based transformer decoder.

forward(x_t, attention_mask=None, positions=None, token_type_ids=None, cls_position=None)

Parameters:

Name Type Description Default
x_t Integer[Tensor, ' *batch seq_len']

The input tokens of shape (*batch, seq_len)

required
t

The timesteps of shape (*batch)

required
attention_mask Optional[Bool[Tensor, ' *batch seq_len']]

The attention mask of shape (*batch, seq_len), which is True for non-padding tokens.

None
positions Optional[Integer[Tensor, ' *batch seq_len']]

The positions of the tokens of shape (*batch, seq_len)

None

RotaryTransformerILMModelWithClassification

Bases: BaseRotaryTransformerILMModel

Rotary embedding based transformer decoder.

forward(x_t, attention_mask=None, positions=None, token_type_ids=None, cls_position=None)

Parameters:

Name Type Description Default
x_t Integer[Tensor, ' *batch seq_len']

The input tokens of shape (*batch, seq_len)

required
t

The timesteps of shape (*batch)

required
attention_mask Optional[Bool[Tensor, ' *batch seq_len']]

The attention mask of shape (*batch, seq_len), which is True for non-padding tokens.

None
positions Optional[Integer[Tensor, ' *batch seq_len']]

The positions of the tokens of shape (*batch, seq_len)

None

BaseGPT2ILMModel

Bases: GPT, Model

forward(x_t, attention_mask=None, positions=None, token_type_ids=None)

Parameters:

Name Type Description Default
x_t Integer[Tensor, ' *batch seq_len']

The input tokens of shape (*batch, seq_len)

required
t

The timesteps of shape (*batch)

required
attention_mask Optional[Bool[Tensor, ' *batch seq_len']]

The attention mask of shape (*batch, seq_len), which is True for non-padding tokens.

None
positions Optional[Integer[Tensor, ' *batch seq_len']]

The positions of the tokens of shape (*batch, seq_len)

None

GPT2ILMModelWithClassification

Bases: BaseGPT2ILMModel

forward(x_t, attention_mask=None, positions=None, token_type_ids=None)

Parameters:

Name Type Description Default
x_t Integer[Tensor, ' *batch seq_len']

The input tokens of shape (*batch, seq_len)

required
t

The timesteps of shape (*batch)

required
attention_mask Optional[Bool[Tensor, ' *batch seq_len']]

The attention mask of shape (*batch, seq_len), which is True for non-padding tokens.

None
positions Optional[Integer[Tensor, ' *batch seq_len']]

The positions of the tokens of shape (*batch, seq_len)

None