Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading/mirroring of test data #1409

Open
mvdbeek opened this issue Nov 17, 2023 · 1 comment
Open

Uploading/mirroring of test data #1409

mvdbeek opened this issue Nov 17, 2023 · 1 comment

Comments

@mvdbeek
Copy link
Member

mvdbeek commented Nov 17, 2023

I would like to be able to have a command that is essentially

planemo upload_test_data that will take local files or URLs from a tool or workflow test, upload them to zenodo if they don’t exist yet and rewrite the test files to point at the newly created (or already existing) test data.

Complications are:

  • there’s no /api/whoami, so we don’t know the user id (if we don’t ask for it …)
  • there doesn’t seem to be a reliable way to list records of a given user, only a free-form search, so users would also have to provide the record (which is OK ?)
  • there is no search for file hash sums which would have allowed us to easily check if the file already exists (but we can iterate over the files of a record and check if any file with the same checksum exists)

Maybe those are not dealbreakers for uploads to zenodo.

Then finally there is the question of whether or not it is something that would be frowned upon by zenodo
if every update that touches test data generates a new DOI (I’m a little unclear about the cost of updating a record, this might not count as minting a new DOI ?).

Bjoern proposed that we could instead use depot.galaxyproject.org, though that of course rules out users doing the upload themselves prior to merging and it’s questionable if we can and want to provide this service to all communities as well.

Another option is that we have different targets for upload_test_data, where one could just be any public galaxy instance (need to publish the history or the datasets in that case). Only when we merge the PR we actually upload data to zenodo, mint a DOI and replace the location in the test file.

@nsoranzo
Copy link
Member

Then finally there is the question of whether or not it is something that would be frowned upon by zenodo
if every update that touches test data generates a new DOI (I’m a little unclear about the cost of updating a record, this might not count as minting a new DOI ?).

Maybe we should have a threshold parameter, so that only big files are uploaded to Zenodo and the rest stays in the GitHub repo as it is now?
Also, I guess we should try to minimise the use of output test data (which is what usually changes with updates) in tests and use assertions whenever possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants