Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor handling of intermittent slow connections #129

Open
julian-hj opened this issue May 13, 2020 · 2 comments
Open

Poor handling of intermittent slow connections #129

julian-hj opened this issue May 13, 2020 · 2 comments

Comments

@julian-hj
Copy link
Member

The releng team is seeing that when pivnet is intermittently slow (requests taking >1 min sometimes) then we get consistent failures in the put step for this resource because the resource makes a lot of requests, and eventually one of them will fail before the whole operation succeeds.

It would be nice if we could specify a higher HTTP timeout for this resource so that put is more resilient to slowness on pivnet.

@julian-hj
Copy link
Member Author

Hi, I'm just updating this issue with a code patch that Dennis & I paired on to temporarily work around the issues we were having in the releng delivery pipelines.

We didn't want to make a PR with these changes because they are a bit blunt, and we didn't test drive anything, so the code quality may not be fully up to snuff. Also, for expediency, we just vendored in the dependent modules, and made changes to the dependent modules. You will probably want to unwind that change.

The gist of the changes was:

  1. we changed the default HTTP connection timeout from 1 minute to 20 minutes
  2. we made some changes to store the passed in lager logger so that we could use it to add additional logging
  3. we added retry loops to most of the operations
  4. we changed some internal code to fail instead of panicking, so that retry would be possible

We put these changes on a private fork in Dennis' org, so we figured the simplest way to deliver them would be in a patch file. Dennis has created that file, and I have attached it here.
patch.txt

Let us know if having a pair would be helpful if/when you pick this up.

Thanks!

@cjnosal
Copy link
Contributor

cjnosal commented Jul 21, 2020

Thank you for the patch. We'd like the timeouts to be user-configurable, and might look at applying the retry logic as a wrapper around go-pivnet's http clients to reduce the duplication. You can follow our progress here https://www.pivotaltracker.com/story/show/173929827

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants