v1.38.0: Slurm GCP v6 for a3-highgpu-8g and added ability to disable automatic updates
What's Changed
Key New Features 🎉
- Add Slurm-GCP v6 based solution for provisioning a3-highgpu-8g compute nodes by @tpdownes in #2859
- Add
allow_automatic_updates
flag by @rohitramu in #2778 - Update slurm-gcp module to use custom endpoints. by @cdunbar13 in #2653
- Add local ssd RAID0 startup script by @alyssa-sm in #2720
New Modules 🧱
- Move GKE Modules to Core by @chengcongdu in #2758
Module Improvements 🔨
- Move
slurm_files
to the repo. by @mr0re1 in #2739 - Fix cleanup compute for different versions of gcloud by @cdunbar13 in #2794
- change default disk_type for GKE nodepool to null by @chengcongdu in #2818
- Add
instance_properties
var tonodeset
by @mr0re1 in #2843 - Enable local SSD formatting solution to set POSIX permissions by @tpdownes in #2863
- support for min_cpu_platform usage on vm-instance by @RachaelSTamakloe in #2873
Improvements 🛠
- Gke optional accelerator by @ankitkinra in #2736
- add test for gke n2 pool with default driver by @chengcongdu in #2811
- Update local ssd examples to use local ssd startup solution by @alyssa-sm in #2870
- Update a3-megagpu-8 example to use local ssd solution by @alyssa-sm in #2871
Deprecations 💤
Version Updates ⏫
Bug fixes 🐞
- Fix construction of
cloud.conf
by @mr0re1 in #2810 - SlurmGCP. Fix broken
--trace-api
flag. by @mr0re1 in #2817 - SlurmGCP6. Fix nodes stack in
down*
state. by @mr0re1 in #2856 - SlurmGCP. Fix bugs around nodeset zones by @mr0re1 in #2864
- Roll back changes in go.mod to release v1.37.2 by @nick-stroud in #2934
New Contributors
- @chengcongdu made their first contribution in #2758
- @ctk21 made their first contribution in #2761
- @arajmane-g made their first contribution in #2854
Full Changelog: v1.37.2...v1.38.0