Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: update compute hook state on detach #9045

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

VladLazar
Copy link
Contributor

Problem

Previously, the storage controller may send compute notifications containing stale pageservers (i.e. pageserver serving the shard was detached). This happened because detaches did not update the compute hook state.

Summary of Changes

Update compute hook state on shard detach.

Fixes #8928

Problem

Previously, the storage controller may send compute notifications
containing stale pageservers (i.e. pageserver serving the shard was
detached). This happened because detaches did not update the compute
hook state.

Summary of Changes

Update compute hook state on shard detach.

Fixes #8928
@VladLazar VladLazar requested a review from a team as a code owner September 18, 2024 13:33
@VladLazar VladLazar requested review from yliang412, jcsp and koivunej and removed request for yliang412 September 18, 2024 13:33
Copy link

github-actions bot commented Sep 18, 2024

4994 tests run: 4830 passed, 0 failed, 164 skipped (full report)


Flaky tests (8)

Postgres 17

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 31.8% (7419 of 23308 functions)
  • lines: 49.8% (59586 of 119702 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
9e47dbb at 2024-09-19T13:35:27.005Z :recycle:

@koivunej
Copy link
Member

koivunej commented Sep 19, 2024

Not entirely sure why would this PR cause this test failure, it has not happened in a long while.

let mut state_locked = self.state.lock().unwrap();
match state_locked.entry(tenant_shard_id.tenant_id) {
Entry::Vacant(_) => {
tracing::warn!("Compute hook tenant not found for detach");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this warning actionable in some way, or just unexpected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it wouldn't happen under normal operating conditions, but the tests think otherwise. Looking into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fixed it. To answer your original question. It's unexpected.

if !sharded {
e.remove();
} else {
e.get_mut().handle_detach(tenant_shard_id, stripe_size);
Copy link
Member

@koivunej koivunej Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are now implementing support for detaching shard-by-shard but I am unsure why would we ever support that vs. detaching all shards.

Regardless, isn't here an unhandled case where the e becomes empty or are we supposed to hold on to a husk of sharded shardless ComputeHookTenant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are now implementing support for detaching shard-by-shard but I am unsure why would we ever support that vs. detaching all shards.

Detach of single shards is the primitive that enables detaching the entire tenant. It's already implemented and required for cutting over to the secondary location. Manual detach of a tenant is implemented by handling each shard in turn here.

Regardless, isn't here an unhandled case where the e becomes empty or are we supposed to hold on to a husk of sharded shardless ComputeHookTenant?

Indeed, good spot. I checked the update path before writing the code. When removing the last shard we can either leave the state as is (empty) or remove it from the map altogether. Both cases are supported on the update path here.

if sharded.stripe_size != stripe_size
|| sharded.shard_count != tenant_shard_id.shard_count
{
tracing::warn!("Shard split detected while handling detach")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess in this case we never complete...? Though, I am quite unsure what is completion, referring to the previous thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In this case we do not update the in-memory compute hook state because we have raced with a shard split. Shard splits reset the compute hook state, but with the new shard count.

if let Some(shard_idx) = shard_idx {
sharded.shards.remove(shard_idx);
} else {
tracing::warn!("Shard not found while handling detach")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure how actionable this is, but lets include the

Suggested change
tracing::warn!("Shard not found while handling detach")
tracing::warn!(
shard_number=%tenant_shard_id.shard_number,
"Shard not found while handling detach"
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already included as the shard id. Can add it, but feels a bit redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in included in some span? If so, sounds like an opportunity to assert that the span has shard_number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span doesn't have the shard number, but the shard number can be inferred from the shard_id which is present in the span. I can assert that's present if you like.

});

if let Some(shard_idx) = shard_idx {
sharded.shards.remove(shard_idx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also invalidate the last sent notification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an open question. This PR is intended to fix a specific case:

  1. Tenant is detached manually by setting its scheduling policy
  2. Tenant is re-attached manually by setting its scheduling policy
  3. Stale compute notification is sent for the tenant because we did not clear the compute hook state in (1)
  4. Compute gets stuck because it can't connect to specified pageservers

The question of how to handle operator driven detaches needs some design work:

  • Should we notify the control plane about detaches?
  • How do we ensure that a stale notification is never sent out? Some cross reconcile coordination would be required.


// TODO: Consider notifying control plane about detaches. This would avoid situations
// where the compute tries to start-up with a stale set of pageservers.
self.compute_hook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this code path, we are usually detaching from this pageserver (it is referenced in observed but not intent) but the shard is still attached to some other pageserver.

We should only be clearing our compute notify state if the whole tenant is configured with a Detached policy. Could infer that in Reconciler by inspecting intent.attached?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only be clearing our compute notify state if the whole tenant is configured with a Detached policy.

Hm. This wasn't my intention. In theory you can have two concurrent reconciles for shards of the same tenant where one detaches and another one attaches. If the detach happens before the attach compute notification, then we would send compute notification containing a stale pageserver. This could happen when the tenant policy is updated to detach and there's one or more on-going reconciles for it's shards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the shard is still attached to some other pageserver

This is a good point though. We should only clear when intent attached is None.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failures in test_scrubber_physical_gc
3 participants