|
1 | 1 | # CHANGELOG
|
2 | 2 |
|
| 3 | +## torchx-0.2.0 |
| 4 | + |
| 5 | +* Milestone: https://github.com/pytorch/torchx/milestone/4 |
| 6 | + |
| 7 | +* `torchx.schedulers` |
| 8 | + * DeviceMounts |
| 9 | + * New mount type 'DeviceMount' that allows mounting a host device into a container in the supported schedulers (Docker, AWS Batch, K8). Custom accelerators and network devices such as Infiniband or Amazon EFA are now supported. |
| 10 | + * Slurm |
| 11 | + * Scheduler integration now supports "max_retries" the same way that our other schedulers do. This only handles whole job level retries and doesn't support per replica retries. |
| 12 | + * Autodetects "nomem" setting by using `sinfo` to get the "Memory" setting for the specified partition |
| 13 | + * More robust slurmint script |
| 14 | + * Kubernetes |
| 15 | + * Support for k8s device plugins/resource limits |
| 16 | + * Added "devices" list of (str, int) tuples to role/resource |
| 17 | + * Added devices.py to map from named devices to DeviceMounts |
| 18 | + * Added logic in kubernetes_scheduler to add devices from resource to resource limits |
| 19 | + * Added logic in aws_batch_scheduler and docker_scheduler to add DeviceMounts for any devices from resource |
| 20 | + * Added "priority_class" argument to kubernetes scheduler to set the priorityClassName of the volcano job. |
| 21 | + * Ray |
| 22 | + * fixes for distributed training, now supported in Beta |
| 23 | + |
| 24 | +* `torchx.specs` |
| 25 | + * Moved factory/builder methods from datastruct specific "specs.api" to "specs.factory" module |
| 26 | + |
| 27 | +* `torchx.runner` |
| 28 | + * Renamed "stop" method to "cancel" for consistency. `Runner.stop` is now deprecated |
| 29 | + * Added warning message when "name" parameter is specified. It is used as part of Session name, which is deprecated so makes "name" obsolete. |
| 30 | + * New env variable TORCHXCONFIG for specified config |
| 31 | + |
| 32 | +* `torchx.components` |
| 33 | + * Removed "base" + "torch_dist_role" since users should prefer to use the `dist.ddp` components instead |
| 34 | + * Removed custom components for example apps in favor of using builtins. |
| 35 | + * Added "env", "max_retries" and "mounts" arguments to utils.sh |
| 36 | + |
| 37 | +* `torchx.cli` |
| 38 | + * Better parsing of configs from a string literal |
| 39 | + * Added support to delimit kv-pairs and list values with "," and ";" interchangeably |
| 40 | + * allow the default scheduler to be specified via .torchxconfig |
| 41 | + * better invalid scheduler messaging |
| 42 | + * Log message about how to disable workspaces |
| 43 | + * Job cancellation support via `torchx cancel <job>` |
| 44 | + |
| 45 | +`torchx.workspace` |
| 46 | + * Support for .dockerignore files used as include lists to fixe some behavioral differences between how .dockerignore files are interpreted by torchx and docker |
| 47 | + |
| 48 | +* Testing |
| 49 | + * Component tests now run sequentially |
| 50 | + * Components can be tested with a runner using `components.components_test_base.ComponentTestCase#run_component()` method. |
| 51 | + |
| 52 | +* Additional Changes |
| 53 | + * Updated Pyre configuration to preemptively guard again upcoming semantic changes |
| 54 | + * Formatting changes from black 22.3.0 |
| 55 | + * Now using pyfmt with usort 1.0 and the new import merging behavior. |
| 56 | + * Added script to automatically get system diagnostics for reporting purposes |
| 57 | + |
| 58 | + |
3 | 59 | ## torchx-0.1.2
|
4 | 60 |
|
5 | 61 | Milestone: https://github.com/pytorch/torchx/milestones/3
|
|
0 commit comments