Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune old develop snapshots #853

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

scottwittenburg
Copy link
Collaborator

Old develop snapshots have been accumulating for about a year now, this PR provides a cronjob to clean up all but the most recent few on a periodic basis.

@scottwittenburg
Copy link
Collaborator Author

@zackgalbreath @kwryankrattiger I know we discussed this several times and never really came to agreement about how and when this should be done to be minimally disruptive. I'm just getting the ball rolling on it again with some concrete implementation we can poke at.

Copy link
Collaborator

@kwryankrattiger kwryankrattiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added my thought from the other day here.

Thanks for starting on this!

parser.add_argument(
"-m",
"--mirror-root",
default="s3://spack-binaries",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base on previous comments related to pruning, I think we should avoid passing production paths as defaults and require them be specified.


# First, try to delete the mirror associated with the snapshot
try:
subprocess.run(["aws", "s3", "rm", "--recursive", url_to_prune], check=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea for a process for making making sure the cache.spack.io and the mirrors are in sync.

check if mirror has index.json

  • if yes -> remove index.json
  • if no -> delete the entire prefix

This will give a buffer time between dropping the tag and deleting the mirror contents. If we run this weekly that should translate to the cache.spack.io page being updated, it uses the the index.json to create a global index file.

The timeline for snapshot pruning could be:

-- Prune Cron Runs @ 2023/01/01 0100 UTC

  • Delete Tag develop-XXXX
  • Delete develop-XXXX/build_cache/index.json

-- Generate cache.spack.io @ 2023/01/02 0100 UTC

  • Create global index
  • push new website without mirrors containing no index.json

-- Prune Cron Runs @ 2023/01/08 0100 UTC

  • Delete prefix develop-XXXX/

py_gh_repo = py_github.get_repo("spack/spack", lazy=True)

# Get a list of all the tags matching the develop snapshot pattern
snapshot_tags = py_gh_repo.get_git_matching_refs("tags/develop-")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method described below would break this query for mirror names.

If we still want to use tags, we could move deleted tags to hidden refs, something like refs/archive/develop-*. The other option is for each prefix listed in the root, match the name to a regex <prefix>/develop-*/ check for the index.json to see what to delete.

I am not sure which method to prefer for this, maybe use both, one for listing tags to remove, the other for deleting mirrors.

@kwryankrattiger
Copy link
Collaborator

Possible change which would change how snapshot refs are queried

#867

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants