Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: make proxying of GETs to pageservers more robust #9065

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Sep 19, 2024

Problem

These commits are split off from https://github.com/neondatabase/neon/pull/8971/commits where I was fixing this to make a better scale test pass -- Vlad also independently recognized these issues with cloudbench in #9062.

  1. The storage controller proxies GET requests to pageservers based on their intent, not the ground truth of where they're really attached.
  2. Proxied requests can race with scheduling to tenants, resulting in 404 responses if the request hits the wrong pageserver.

Closes: #9062

Summary of changes

  1. If a shard has a running reconciler, then use the database generation_pageserver to decide who to proxy the request to
  2. If such a request gets a 404 response and its scheduled node has changed since the request was dispatched.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp requested a review from a team as a code owner September 19, 2024 16:57
@jcsp jcsp added t/bug Issue Type: Bug c/storage/controller Component: Storage Controller labels Sep 19, 2024
@jcsp jcsp changed the title Jcsp/controller better proxying storage controller: make proxying of GETs to pageservers more robust Sep 19, 2024
Copy link

4968 tests run: 4804 passed, 0 failed, 164 skipped (full report)


Flaky tests (8)

Postgres 17

Postgres 16

Postgres 14

Code coverage* (full report)

  • functions: 31.8% (7420 of 23301 functions)
  • lines: 49.8% (59726 of 119930 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
4323bdc at 2024-09-19T17:52:22.111Z :recycle:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Storage controller proxies requests by intent leading to unavailability
1 participant