-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storcon: update compute hook state on detach #9045
base: main
Are you sure you want to change the base?
Changes from all commits
5105d23
a00ab03
2bf618e
9e47dbb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -71,6 +71,37 @@ impl ComputeHookTenant { | |||||||||||
} | ||||||||||||
} | ||||||||||||
|
||||||||||||
fn is_sharded(&self) -> bool { | ||||||||||||
matches!(self, ComputeHookTenant::Sharded(_)) | ||||||||||||
} | ||||||||||||
|
||||||||||||
/// Clear compute hook state for the specified shard. | ||||||||||||
/// Only valid for [`ComputeHookTenant::Sharded`] instances. | ||||||||||||
fn remove_shard(&mut self, tenant_shard_id: TenantShardId, stripe_size: ShardStripeSize) { | ||||||||||||
match self { | ||||||||||||
ComputeHookTenant::Sharded(sharded) => { | ||||||||||||
if sharded.stripe_size != stripe_size | ||||||||||||
|| sharded.shard_count != tenant_shard_id.shard_count | ||||||||||||
{ | ||||||||||||
tracing::warn!("Shard split detected while handling detach") | ||||||||||||
} | ||||||||||||
|
||||||||||||
let shard_idx = sharded.shards.iter().position(|(shard_number, _node_id)| { | ||||||||||||
*shard_number == tenant_shard_id.shard_number | ||||||||||||
}); | ||||||||||||
|
||||||||||||
if let Some(shard_idx) = shard_idx { | ||||||||||||
sharded.shards.remove(shard_idx); | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this also invalidate the last sent notification? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's an open question. This PR is intended to fix a specific case:
The question of how to handle operator driven detaches needs some design work:
|
||||||||||||
} else { | ||||||||||||
tracing::warn!("Shard not found while handling detach") | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unsure how actionable this is, but lets include the
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's already included as the shard id. Can add it, but feels a bit redundant. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As in included in some span? If so, sounds like an opportunity to assert that the span has shard_number. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The span doesn't have the shard number, but the shard number can be inferred from the shard_id which is present in the span. I can assert that's present if you like. |
||||||||||||
} | ||||||||||||
} | ||||||||||||
ComputeHookTenant::Unsharded(_) => { | ||||||||||||
unreachable!("Detach of unsharded tenants is handled externally"); | ||||||||||||
} | ||||||||||||
} | ||||||||||||
} | ||||||||||||
|
||||||||||||
/// Set one shard's location. If stripe size or shard count have changed, Self is reset | ||||||||||||
/// and drops existing content. | ||||||||||||
fn update( | ||||||||||||
|
@@ -614,6 +645,36 @@ impl ComputeHook { | |||||||||||
self.notify_execute(maybe_send_result, tenant_shard_id, cancel) | ||||||||||||
.await | ||||||||||||
} | ||||||||||||
|
||||||||||||
/// Reflect a detach for a particular shard in the compute hook state. | ||||||||||||
/// | ||||||||||||
/// The goal is to avoid sending compute notifications with stale information (i.e. | ||||||||||||
/// including detach pageservers). | ||||||||||||
#[tracing::instrument(skip_all, fields(tenant_id=%tenant_shard_id.tenant_id, shard_id=%tenant_shard_id.shard_slug()))] | ||||||||||||
pub(super) fn handle_detach( | ||||||||||||
&self, | ||||||||||||
tenant_shard_id: TenantShardId, | ||||||||||||
stripe_size: ShardStripeSize, | ||||||||||||
) { | ||||||||||||
use std::collections::hash_map::Entry; | ||||||||||||
|
||||||||||||
let mut state_locked = self.state.lock().unwrap(); | ||||||||||||
match state_locked.entry(tenant_shard_id.tenant_id) { | ||||||||||||
Entry::Vacant(_) => { | ||||||||||||
tracing::warn!("Compute hook tenant not found for detach"); | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this warning actionable in some way, or just unexpected? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought it wouldn't happen under normal operating conditions, but the tests think otherwise. Looking into it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, fixed it. To answer your original question. It's unexpected. |
||||||||||||
} | ||||||||||||
Entry::Occupied(mut e) => { | ||||||||||||
let sharded = e.get().is_sharded(); | ||||||||||||
if !sharded { | ||||||||||||
e.remove(); | ||||||||||||
} else { | ||||||||||||
e.get_mut().remove_shard(tenant_shard_id, stripe_size); | ||||||||||||
} | ||||||||||||
|
||||||||||||
tracing::debug!("Compute hook handled shard detach"); | ||||||||||||
} | ||||||||||||
} | ||||||||||||
} | ||||||||||||
} | ||||||||||||
|
||||||||||||
#[cfg(test)] | ||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess in this case we never complete...? Though, I am quite unsure what is completion, referring to the previous thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In this case we do not update the in-memory compute hook state because we have raced with a shard split. Shard splits reset the compute hook state, but with the new shard count.