Skip to content

Commit 3987139

Browse files
committed
Restore optimizer weights from fp16 when ZeRO checkpoints are missing
When loading a checkpoint that doesn't have ZeRO checkpoint files, restore the optimizer weights from the fp16 parameters instead of leaving them uninitialized. Without this fix, DeepSpeed behaves incorrectly on checkpoints that were trained without DeepSpeed.
1 parent 86cc636 commit 3987139

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

deepspeed/runtime/engine.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1596,6 +1596,7 @@ def _load_checkpoint(self,
15961596
def _load_zero_checkpoint(self, load_dir, tag, load_optimizer_states=True):
15971597
zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
15981598
if zero_sd_list is None:
1599+
self.optimizer._restore_from_fp16_weights()
15991600
return
16001601

16011602
self.optimizer.load_state_dict(

0 commit comments

Comments
 (0)