Skip to content

xlm.utils.checkpoint_paths

Checkpoint path helpers for Lightning training resume paths.

is_distcp_sharded_checkpoint_dir(path)

Return True if path looks like a Lightning state_dict_type: sharded checkpoint directory.

Such checkpoints contain one *.distcp file per rank (flat layout under e.g. last.ckpt/).

is_consolidatable_lightning_sharded_dir(path)

Return True if path can be passed to Lightning's distributed checkpoint consolidation.

Requires shard files plus meta.pt (rank-0 metadata written by Lightning).

is_usable_lightning_train_checkpoint_path(path)

Return True if path can be passed to Trainer.fit(..., ckpt_path=...).

Accepts a regular .ckpt file or a directory with at least one *.distcp shard.

resolve_explicit_resume_checkpoint_path(path)

Validate resume_checkpoint_path from config.

Raises:

Type Description
ValueError

If the path does not exist, or is a directory without *.distcp shards.

find_auto_resume_checkpoint(checkpointing_dir)

Return on_exception.ckpt or last.ckpt under checkpointing_dir if usable.

Prefers on_exception.ckpt when present. Paths may be files or sharded directories.