47 lines
588 B
Markdown
47 lines
588 B
Markdown
# Torch Distributed Elastic
|
|
|
|
Makes distributed PyTorch fault-tolerant and elastic.
|
|
|
|
## Get Started
|
|
|
|
```{toctree}
|
|
:caption: Usage
|
|
:maxdepth: 1
|
|
|
|
elastic/quickstart
|
|
elastic/train_script
|
|
elastic/examples
|
|
```
|
|
|
|
## Documentation
|
|
|
|
```{toctree}
|
|
:caption: API
|
|
:maxdepth: 1
|
|
|
|
elastic/run
|
|
elastic/agent
|
|
elastic/multiprocessing
|
|
elastic/errors
|
|
elastic/rendezvous
|
|
elastic/timer
|
|
elastic/metrics
|
|
elastic/events
|
|
elastic/subprocess_handler
|
|
elastic/control_plane
|
|
```
|
|
|
|
```{toctree}
|
|
:caption: Advanced
|
|
:maxdepth: 1
|
|
|
|
elastic/customization
|
|
```
|
|
|
|
```{toctree}
|
|
:caption: Plugins
|
|
:maxdepth: 1
|
|
|
|
elastic/kubernetes
|
|
```
|