-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storcon: update compute hook state on detach #9045
base: main
Are you sure you want to change the base?
Conversation
Problem Previously, the storage controller may send compute notifications containing stale pageservers (i.e. pageserver serving the shard was detached). This happened because detaches did not update the compute hook state. Summary of Changes Update compute hook state on shard detach. Fixes #8928
4994 tests run: 4830 passed, 0 failed, 164 skipped (full report)Flaky tests (8)Postgres 17
Postgres 16
Postgres 15
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
9e47dbb at 2024-09-19T13:35:27.005Z :recycle: |
Not entirely sure why would this PR cause this test failure, it has not happened in a long while. |
let mut state_locked = self.state.lock().unwrap(); | ||
match state_locked.entry(tenant_shard_id.tenant_id) { | ||
Entry::Vacant(_) => { | ||
tracing::warn!("Compute hook tenant not found for detach"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this warning actionable in some way, or just unexpected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it wouldn't happen under normal operating conditions, but the tests think otherwise. Looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fixed it. To answer your original question. It's unexpected.
if !sharded { | ||
e.remove(); | ||
} else { | ||
e.get_mut().handle_detach(tenant_shard_id, stripe_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are now implementing support for detaching shard-by-shard but I am unsure why would we ever support that vs. detaching all shards.
Regardless, isn't here an unhandled case where the e
becomes empty or are we supposed to hold on to a husk of sharded shardless ComputeHookTenant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are now implementing support for detaching shard-by-shard but I am unsure why would we ever support that vs. detaching all shards.
Detach of single shards is the primitive that enables detaching the entire tenant. It's already implemented and required for cutting over to the secondary location. Manual detach of a tenant is implemented by handling each shard in turn here.
Regardless, isn't here an unhandled case where the e becomes empty or are we supposed to hold on to a husk of sharded shardless ComputeHookTenant?
Indeed, good spot. I checked the update path before writing the code. When removing the last shard we can either leave the state as is (empty) or remove it from the map altogether. Both cases are supported on the update path here.
if sharded.stripe_size != stripe_size | ||
|| sharded.shard_count != tenant_shard_id.shard_count | ||
{ | ||
tracing::warn!("Shard split detected while handling detach") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess in this case we never complete...? Though, I am quite unsure what is completion, referring to the previous thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In this case we do not update the in-memory compute hook state because we have raced with a shard split. Shard splits reset the compute hook state, but with the new shard count.
if let Some(shard_idx) = shard_idx { | ||
sharded.shards.remove(shard_idx); | ||
} else { | ||
tracing::warn!("Shard not found while handling detach") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure how actionable this is, but lets include the
tracing::warn!("Shard not found while handling detach") | |
tracing::warn!( | |
shard_number=%tenant_shard_id.shard_number, | |
"Shard not found while handling detach" | |
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already included as the shard id. Can add it, but feels a bit redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in included in some span? If so, sounds like an opportunity to assert that the span has shard_number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The span doesn't have the shard number, but the shard number can be inferred from the shard_id which is present in the span. I can assert that's present if you like.
}); | ||
|
||
if let Some(shard_idx) = shard_idx { | ||
sharded.shards.remove(shard_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also invalidate the last sent notification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an open question. This PR is intended to fix a specific case:
- Tenant is detached manually by setting its scheduling policy
- Tenant is re-attached manually by setting its scheduling policy
- Stale compute notification is sent for the tenant because we did not clear the compute hook state in (1)
- Compute gets stuck because it can't connect to specified pageservers
The question of how to handle operator driven detaches needs some design work:
- Should we notify the control plane about detaches?
- How do we ensure that a stale notification is never sent out? Some cross reconcile coordination would be required.
storage_controller/src/reconciler.rs
Outdated
|
||
// TODO: Consider notifying control plane about detaches. This would avoid situations | ||
// where the compute tries to start-up with a stale set of pageservers. | ||
self.compute_hook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this code path, we are usually detaching from this pageserver (it is referenced in observed but not intent) but the shard is still attached to some other pageserver.
We should only be clearing our compute notify state if the whole tenant is configured with a Detached policy. Could infer that in Reconciler by inspecting intent.attached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only be clearing our compute notify state if the whole tenant is configured with a Detached policy.
Hm. This wasn't my intention. In theory you can have two concurrent reconciles for shards of the same tenant where one detaches and another one attaches. If the detach happens before the attach compute notification, then we would send compute notification containing a stale pageserver. This could happen when the tenant policy is updated to detach and there's one or more on-going reconciles for it's shards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the shard is still attached to some other pageserver
This is a good point though. We should only clear when intent attached is None.
Problem
Previously, the storage controller may send compute notifications containing stale pageservers (i.e. pageserver serving the shard was detached). This happened because detaches did not update the compute hook state.
Summary of Changes
Update compute hook state on shard detach.
Fixes #8928