You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The releng team is seeing that when pivnet is intermittently slow (requests taking >1 min sometimes) then we get consistent failures in the put step for this resource because the resource makes a lot of requests, and eventually one of them will fail before the whole operation succeeds.
It would be nice if we could specify a higher HTTP timeout for this resource so that put is more resilient to slowness on pivnet.
The text was updated successfully, but these errors were encountered:
Hi, I'm just updating this issue with a code patch that Dennis & I paired on to temporarily work around the issues we were having in the releng delivery pipelines.
We didn't want to make a PR with these changes because they are a bit blunt, and we didn't test drive anything, so the code quality may not be fully up to snuff. Also, for expediency, we just vendored in the dependent modules, and made changes to the dependent modules. You will probably want to unwind that change.
The gist of the changes was:
we changed the default HTTP connection timeout from 1 minute to 20 minutes
we made some changes to store the passed in lager logger so that we could use it to add additional logging
we added retry loops to most of the operations
we changed some internal code to fail instead of panicking, so that retry would be possible
We put these changes on a private fork in Dennis' org, so we figured the simplest way to deliver them would be in a patch file. Dennis has created that file, and I have attached it here. patch.txt
Let us know if having a pair would be helpful if/when you pick this up.
Thank you for the patch. We'd like the timeouts to be user-configurable, and might look at applying the retry logic as a wrapper around go-pivnet's http clients to reduce the duplication. You can follow our progress here https://www.pivotaltracker.com/story/show/173929827
The releng team is seeing that when pivnet is intermittently slow (requests taking >1 min sometimes) then we get consistent failures in the
put
step for this resource because the resource makes a lot of requests, and eventually one of them will fail before the whole operation succeeds.It would be nice if we could specify a higher HTTP timeout for this resource so that put is more resilient to slowness on pivnet.
The text was updated successfully, but these errors were encountered: