Skip to content

Releases: GoogleCloudPlatform/cluster-toolkit

v1.13.0: HCLS Example Blueprint, New Chrome Remote Desktop Module, & Image Building Improvements

16 Feb 20:42
dda341f
Compare
Choose a tag to compare

Key New Features

Other Improvements

Version updates

  • install-htcondor: Update HTCondor release to current 10.x series.
  • Slurm on GCP updated to version 5.4.1 (PR).
  • Google Terraform Provider pinned to version 4.51.0 and will be bumped with each future release.

What's Changed

New Contributors

Full Changelog: v1.12.0...v1.13.0

v1.12.0: Google Cloud Storage module and Fluent Tutorial

31 Jan 21:29
73fb63f
Compare
Choose a tag to compare

Key New Features

New Resources

Improvements

  • Improved documentation and module automation for GPUs support.
  • Various improvements in the ghpc engine code.

Bug Fixes

  • Fixed error when ghpc was run outside of the HPC-Toolkit folder (PR).
  • Fixed category-field bug preventing some users from deploying HPC monitoring dashboards.

Version updates

  • DAOS examples updated to use google-cloud-daos v0.3.0.
  • Slurm on GCP updated to version 5.4.0 (PR).
  • Updating cloud.google.com/go/compute from 1.15.0 to 1.15.1.
  • Updated google.golang.org/api from 0.106.0 to 0.108.0.
  • Google Terraform Provider pinned to version 4.49.0 and will be bumped with each future release.

What's Changed

New Contributors

Full Changelog: v1.11.0...v1.12.0

v1.11.0: Usability Improvements for GPUs, Validation of `use` Field, & Miscellaneous Slurm Improvements

18 Jan 21:55
d706498
Compare
Choose a tag to compare

Key New Features

  • GPU type and count is automatically populated when using A2 series machines for vm-instance and Slurm v5 node-group, controller, & login-node.
  • ghpc now validates that modules linked using the use field have common outputs and settings.

Resource Improvements

Version updates

  • Google Terraform Provider pinned to version 4.48.0 and will be bumped with each future release.

What's Changed

  • Strip newlines chars before searching for startup failure by @nick-stroud in #783
  • Allow local disk labels, merged with var.labels by @heyealex in #764
  • Add PBS Pro integration test by @tpdownes in #782
  • Add a validator for unused modules in the "use" list by @heyealex in #760
  • Bump google.golang.org/api from 0.104.0 to 0.105.0 by @dependabot in #786
  • Fix Quantum AI example by pinning to g++ 10 instead of 11 by @tpdownes in #788
  • Make URL for deployment tarball cut-and-paste-able. by @tpdownes in #789
  • Add no_comma_params option in "cloud_parameters" by @heyealex in #765
  • Add partition level startup script variables by @heyealex in #785
  • Add slurm gcp v5 integration test with startup scripts by @heyealex in #790
  • Increase HTCondor installation timeout by @tpdownes in #792
  • Run dependabot at repeatable weekly interval by @tpdownes in #793
  • Add disable_public_ips option to node group module by @heyealex in #791
  • Sourcereader wrapfs workaround by @thiagosgobe in #748
  • Updating develop post-release of 1.10.1 by @nick-stroud in #801
  • Add option to set instance template for login and controller slurm-gcp v5 modules by @heyealex in #787
  • Remove singularity install to avoid failed package install by @heyealex in #804
  • Bump github.com/go-git/go-billy/v5 from 5.3.1 to 5.4.0 by @dependabot in #802
  • Automate gpu guest accelerator in vm-instance if not set by @heyealex in #805
  • Bump google.golang.org/api from 0.105.0 to 0.106.0 by @dependabot in #806
  • Apply gpu_definition to slurm modules by @heyealex in #807
  • Bump cloud.google.com/go/serviceusage from 1.4.0 to 1.5.0 by @dependabot in #809
  • Bump cloud.google.com/go/compute from 1.14.0 to 1.15.0 by @dependabot in #808
  • Modification to handling of django key to keep it local only to webse… by @mattstreet-nag in #755
  • Pass family through instead of looking up image to allow compute nodes to pick up new version within family by @nick-stroud in #810
  • Fix broken link to application tutorial diagram by @nick-stroud in #813
  • Enable Slurm v4 image to be specified by name by @nick-stroud in #814
  • Add slash to network storage output for pre-existing file systems by @heyealex in #812
  • Bump google provider max version to 4.47.0 by @cboneti in #818
  • Enable cleanup of active compute nodes on destroy for high io test by @nick-stroud in #819
  • Bump github.com/aws/aws-sdk-go from 1.15.78 to 1.33.0 by @dependabot in #821
  • Move directory check to after embedded checks by @heyealex in #822
  • Adding support for existing GCS bucket in startup script module. by @soumyapani in #820
  • Update TF google provider version to 4.48 by @heyealex in #823
  • Rolling version to 1.11.0 by @nick-stroud in #841
  • Release v1.11.0 by @nick-stroud in #837
  • Allow Dependabot YAML parser to read time value by @tpdownes in #842

New Contributors

Full Changelog: v1.10.1...v1.11.0

v1.10.1: Update to Slurm v5.3, Bug Fixes, Documentation Updates

22 Dec 00:02
3c03c9a
Compare
Choose a tag to compare

Key New Features

  • All Slurm v5 modules have been updated from v5.2.0 -> v5.3.0. For more information, see the changelog for Slurm on GCP.

Improvements

What's Changed

  • Cleanup examples README by @heyealex in #752
  • Update login and controller to use standard image setting format by @heyealex in #754
  • Remove duplicated module and example lists in community READMEs by @heyealex in #750
  • Bump cloud.google.com/go/compute from 1.12.1 to 1.14.0 by @dependabot in #759
  • Update guidance to use incremental placement to avoid deadlock by @nick-stroud in #766
  • Remove outdated warning in node_groups variable by @heyealex in #763
  • Always include a startup script with a pre-determined name even if script is empty string by @nick-stroud in #777
  • Always include a startup script with a pre-determined name even if script is empty string by @nick-stroud in #778
  • Merge main into develop after release 1.10.0 by @cboneti in #780
  • Configure dependency review by @cboneti in #781
  • Bump oauthlib from 3.2.0 to 3.2.1 in /community/front-end/ofe by @dependabot in #769
  • Bump pyjwt from 2.3.0 to 2.4.0 in /community/front-end/ofe by @dependabot in #770
  • Bump django from 3.2.12 to 3.2.16 in /community/front-end/ofe by @dependabot in #771
  • Bump protobuf from 3.19.4 to 3.19.5 in /community/front-end/ofe by @dependabot in #772
  • Bump google.golang.org/api from 0.103.0 to 0.104.0 by @dependabot in #774
  • Bump certifi from 2021.10.8 to 2022.12.7 in /community/front-end/ofe by @dependabot in #779
  • Update hybrid docs to conform to 5.3.0 by @heyealex in #794
  • Update slurm v5.3.0 by @heyealex in #795
  • Rolling google terraform provider version to 4.46.0 by @nick-stroud in #797
  • Fix Quantum AI example by pinning to g++ 10 instead of 11 (known failure) by @nick-stroud in #799
  • Rolling the Toolkit version to 1.10.1 by @nick-stroud in #798
  • Version 1.10.1 by @nick-stroud in #796

Full Changelog: v1.10.0...v1.10.1

v1.10.0: Open Front End and new Batch MPI example

07 Dec 20:49
5693e89
Compare
Choose a tag to compare

Key New Features

  • Open Front-End Web UI added in community/front-end/ofe
  • New Batch MPI Example running WRF

Version updates

  • spack-install: default spack version updated from v0.18.0 to v0.19.0.

Improvements

  • Fixed a bug where ghpc would exit with an error but with rc=0 instead of rc=1 when failing to overwrite a deployment folder.
  • New integration tests.
  • Improved documentation and documentation links.
  • Now fixing a google cloud terraform provider to the last stable version

Bug Fixes

  • nfs-server: Fixed bug when deploying with multiple mount points that share the same destination filenames
  • wait-for-startup: Timeouts now properly reported (vs previous unknown errors)

What's Changed

New Contributors

Full Changelog: v1.9.0...v1.10.0

v1.9.0: Altair PBS Pro, Core Support for Batch, Simplified Network Storage

11 Nov 23:39
0f0f70e
Compare
Choose a tag to compare

Key New Features

New Resources

  • schedmd-slurm-gcp-v5-node-group: Support modules for defining one or more node groups used in defining a schedmd-slurm-gcp-v5-partition.
  • PBS Pro Modules:
    • pbspro-execution: Provisions one or more PBS execution hosts to run jobs in a PBS Professional cluster.
    • pbspro-client: Provisions one or more PBS Client hosts to submit jobs to a PBS Professional cluster.
    • pbspro-server: Provisions a PBS Server Host to operate and administer a PBS Professional cluster.
    • pbspro-install: Creates Toolkit runners that download PBS Pro RPM packages and installs them with configuration settings as documented in the PBS Pro "Big Book".
    • pbspro-preinstall: Uploads PBS Pro RPM packages and, optionally, a license file to Google Cloud Storage.
    • pbspro-qmgr: Creates a Toolkit runner that performs the following administrative PBS configurations on a PBS server.

Resource Improvements

Version updates

Deprecations

v1.8.0: Improved startup-script automation, multiple network interfaces in vm-instance, escapes for variable characters

02 Nov 17:44
78bb2bd
Compare
Choose a tag to compare

Key New Features

  • Ansible install script is automatically installed if it's detected as a dependency of other runners.
  • Multiple network interfaces can be added in vm-instance.
  • Ability to escape variable characters in module settings.
  • Remote filesystems now supply client installation and mounting scripts
  • Remote filesystem mounting scripts no longer depend upon Ansible, significantly reducing time before filesystems are available

Resource Improvements

  • vm-instance: Support for multiple network interfaces.
  • startup-script: Ansible installation script automatically included when other runners depend on it.

Improvements

  • Escape variable characters: ”\$(...)” evaluates to ”$(...)”

What's Changed

New Contributors

Full Changelog: v1.7.0...v1.8.0

v1.7.0: Improved blueprint validation, official support for Ubuntu, and bug fixes for Slurm v5

19 Oct 19:40
f4ed7c1
Compare
Choose a tag to compare

Key New Features

Improvements

  • Batch modules now support Shared VPC networks
  • VPC module enables jumbo frames by default
  • AMD-optimized blueprint includes the OpenFOAM application compiled for the Zen 3 microarchitecture
  • A new example blueprint demonstrates using local SSD disks with VM instances
  • A new example blueprint demonstrates installation of StarCCM+ CFD simulation application

Bug Fixes

  • Resolve Slurm v5 startup-script timeout errors
    by demonstrating the use of a build VM to install Spack and optimized applications
  • Fix incompatibility between DDN Exascaler (Lustre) and Slurm v5 modules

What's Changed

New Contributors

Full Changelog: v1.6.0...v1.7.0

v1.6.0: DDN-EXAScaler update and improved functionality, Source modules from generic git repos

04 Oct 23:32
54270c1
Compare
Choose a tag to compare

Key New Features

  • DDN-EXAScaler module version update and further support added for DDN-EXAScaler with other modules.
  • Import modules from gitlab and other generic git repositories.

Resource Improvements

Version updates

Improvements

  • git commit and branch information included when running ghpc --version.
  • Shell runners are run as an executable rather than sourced.
  • Documentation Slurm on GCP hybrid demo using a cloud based Slurm controller.
  • hpc-cluster-amd-slurmv5.yaml example uses a builder VM for spack installation.
  • Ability to import modules from generic git repositories with the git:: prefix in source.

Deprecations

  • In the variable image of DDN-EXAScaler, name is no longer supported in favor of project and family.

Bugfixes

  • Fixed startup script failure in HTCondor autoscaler configuration

What's Changed

New Contributors

Full Changelog: v1.5.0...v1.6.0

v1.5.0: Hybrid Slurm clusters, Blueprints optimized for NVIDIA GPUs and AMD CPUs, & Bug Fixes

20 Sep 17:16
6e8e1f7
Compare
Choose a tag to compare

Key New Features

  • Support for "hybrid" Slurm partitions (see New Resources below)
  • Example blueprints to provision
  • kind setting for modules defaults to "terraform", the most common value

New Resources

  • schedmd-slurm-gcp-v5-hybrid: Experimental module to create cloud-based partitions capable of extending on-premise clusters into Google Cloud.
     

Resource Improvements

What's Changed

New Contributors

Full Changelog: v1.4.1...v1.5.0