`ilm.datamodule_ilm`

`ILMEmptyDataset`

Bases: IterableDataset

Parameters:

Name	Type	Description	Default
`tokenizer_kwargs`		Keyword arguments for the tokenizer.	required
`empty_text`		For MLM, you will want to set the `empty_text` to a sequence of all mask tokens.	required

Bases: Collator

Used for pre-training.

Drops tokens from the suffix only.

Drops all the suffix/target tokens and sends them in the target_ids of shape (batch_size, target_seq_len)

Drops tokens from a single segment of a single sequence. Adds bos. Adds cls as requested.