This repository was archived by the owner on Mar 14, 2024. It is now read-only.
Commit 53ec1b1
Fix race condition in writing config to checkpoint
Summary:
We used to have _all_ trainers write the config to the checkpoint, at the same time. This is already problematic but what's worse is that only trainer 0 was creating the checkpoint directory. Thus if it didn't exist and a non-0 trainer was the first to reach that point the write would fail.
I'm fixing it in the same way we fixed all other similar issues: have only the rank-0 trainer write this.
Reviewed By: adamlerer
Differential Revision: D17787303
fbshipit-source-id: c3464dd9929ff95d54865ed03f041388d85c6f0d1 parent 3ee2838 commit 53ec1b1
1 file changed
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
470 | 470 | | |
471 | 471 | | |
472 | 472 | | |
473 | | - | |
| 473 | + | |
| 474 | + | |
474 | 475 | | |
475 | 476 | | |
476 | 477 | | |
| |||
0 commit comments