61 lines
1.6 KiB
ReStructuredText
61 lines
1.6 KiB
ReStructuredText
Expiration Timers
|
|
==================
|
|
|
|
.. automodule:: torch.distributed.elastic.timer
|
|
.. currentmodule:: torch.distributed.elastic.timer
|
|
|
|
Client Methods
|
|
---------------
|
|
.. autofunction:: torch.distributed.elastic.timer.configure
|
|
|
|
.. autofunction:: torch.distributed.elastic.timer.expires
|
|
|
|
Server/Client Implementations
|
|
------------------------------
|
|
Below are the timer server and client pairs that are provided by torchelastic.
|
|
|
|
.. note:: Timer server and clients always have to be implemented and used
|
|
in pairs since there is a messaging protocol between the server
|
|
and client.
|
|
|
|
Below is a pair of timer server and client that is implemented based on
|
|
a ``multiprocess.Queue``.
|
|
|
|
.. autoclass:: LocalTimerServer
|
|
|
|
.. autoclass:: LocalTimerClient
|
|
|
|
Below is another pair of timer server and client that is implemented
|
|
based on a named pipe.
|
|
|
|
.. autoclass:: FileTimerServer
|
|
|
|
.. autoclass:: FileTimerClient
|
|
|
|
|
|
Writing a custom timer server/client
|
|
--------------------------------------
|
|
|
|
To write your own timer server and client extend the
|
|
``torch.distributed.elastic.timer.TimerServer`` for the server and
|
|
``torch.distributed.elastic.timer.TimerClient`` for the client. The
|
|
``TimerRequest`` object is used to pass messages between
|
|
the server and client.
|
|
|
|
.. autoclass:: TimerRequest
|
|
:members:
|
|
|
|
.. autoclass:: TimerServer
|
|
:members:
|
|
|
|
.. autoclass:: TimerClient
|
|
:members:
|
|
|
|
|
|
Debug info logging
|
|
-------------------
|
|
|
|
.. automodule:: torch.distributed.elastic.timer.debug_info_logging
|
|
|
|
.. autofunction:: torch.distributed.elastic.timer.debug_info_logging.log_debug_info_for_expired_timers
|