Skip to content

v1.15.0: Improvements to Slurm and HTCondor solutions

Compare
Choose a tag to compare
@tpdownes tpdownes released this 21 Mar 16:13
· 3756 commits to main since this release
4787cde

Key New Features

  • Support for HTCondor pools with both On-demand and Spot VMs
  • Slurm solution updated to 5.6.0
    • Support for custom machine types
    • Label exclusive nodes with job ID for cost-tracking
    • New zone_target_shape parameter corresponding to bulkInsert targetShape parameter
    • FIX: lustre mounting regression introduced in 5.5.0

Improvements

  • [filestore] module added supported for Shared VPCs via use keyword and pre-existing-vpc module
  • HTCondor modules now use minimally-scoped authentication for each daemon
  • HTCondor execute points disable benchmarks to decrease time to join pool
  • Improved type alignment across modules. e.g. var.labels aligned to map(string)

What's Changed

  • Rename filestore network_name to network_id to enable shared VPC via use by @nick-stroud in #962
  • Improve attribute tracking in HTCondor scheduler by @tpdownes in #965
  • Update fluent tutorial to use pre-existing-vpc module and other minor syntax updates by @nick-stroud in #963
  • Revert "Rename filestore network_name to network_id to enable shared VPC via use" by @nick-stroud in #967
  • Mask sleep/suspend targets on chrome-remote-desktop to prevent shutdown by @nick-stroud in #968
  • Update image building example to use Slurm V5 by @mr0re1 in #964
  • Improve HTCondor job matchmaking speed by @tpdownes in #971
  • Roll-forward:"Rename filestore network_name to network_id to enable shared VPC via use" by @nick-stroud in #969
  • Increase reliability of blueprints using DDN Exascaler by @tpdownes in #972
  • Further increase speed at which HTCondor daemons update their ClassAds by @tpdownes in #974
  • Initial support for Spot VMs within HTCondor pools by @tpdownes in #973
  • Convert HTCondor autoscaler to SystemD timer by @tpdownes in #975
  • Add validation to prevent usage of variables in backend block. by @mr0re1 in #970
  • Making OFE deploy.sh MacOS compatible. Fixes #978 by @ek-nag in #979
  • Improve Slurm log capturing by @tpdownes in #980
  • Support Spot VMs in HTCondor pools by @tpdownes in #981
  • Add utils for parising and normalizing HCL dtype by @mr0re1 in #977
  • Enable depth-first filling of HTCondor pools by @tpdownes in #982
  • Escalate to root priveleges to fetch Slurm logs by @mr0re1 in #987
  • Bump google.golang.org/api from 0.110.0 to 0.111.0 by @dependabot in #984
  • Bump github.com/spf13/afero from 1.9.4 to 1.9.5 by @dependabot in #985
  • Bump github.com/go-git/go-git/v5 from 5.4.2 to 5.6.0 by @dependabot in #986
  • Bump dill from 0.3.4 to 0.3.6 in /community/front-end/ofe by @dependabot in #990
  • Bump google-cloud-core from 2.2.2 to 2.3.2 in /community/front-end/ofe by @dependabot in #991
  • Bump astroid from 2.9.3 to 2.15.0 in /community/front-end/ofe by @dependabot in #992
  • Bump proto-plus from 1.20.1 to 1.22.2 in /community/front-end/ofe by @dependabot in #993
  • Bump isort from 5.10.1 to 5.12.0 in /community/front-end/ofe by @dependabot in #994
  • Merge main into develop after release v1.14.0 by @mr0re1 in #997
  • Bump terraform providers version 4.53.1 -> 4.56.0 by @mr0re1 in #998
  • Clean up Filestore regardless of instances presence by @mr0re1 in #999
  • Upgrade to slurm-gcp 5.6.0 by @SkylerMalinowski in #995
  • Fix nfs-server example to use local_mounts instead of local_mount by @nick-stroud in #1001
  • Add missing description for gcs_bucket_path by @nick-stroud in #1002
  • Doc fix by @issacg in #1010
  • Add mounting of cloud-storage-bucket to Slurm v5 test by @nick-stroud in #1007
  • Use DeploymentName getter instead of looking up Vars by @mr0re1 in #1005
  • Specify strict type for labels = map(string) by @mr0re1 in #1000
  • Pass empty string instead of null to avoid mounting failure in Slurm by @nick-stroud in #1003
  • Remove ghpc_role setting from nfs-server example by @nick-stroud in #1008
  • Actually check mount instead of just checking dir exists by @nick-stroud in #1004
  • Remove hostname test as it is not providing incremental value by @nick-stroud in #1006
  • Double length of time for HTCondor integration test to detect job queue by @tpdownes in #1020
  • Bump github.com/googleapis/gax-go/v2 from 2.7.0 to 2.7.1 by @dependabot in #1011
  • Bump github.com/hashicorp/hcl/v2 from 2.16.1 to 2.16.2 by @dependabot in #1012
  • Update slurm v5 readme about local-exec dependencies by @mr0re1 in #1023
  • Bump google.golang.org/api from 0.111.0 to 0.112.0 by @dependabot in #1013
  • Update OFE Dependabot configuration by @tpdownes in #1055
  • Release v1.15.0 by @tpdownes in #1065

New Contributors

Full Changelog: v1.14.1...v1.15.0