-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Final batch of changes for 8.0 GA release #1030
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add help wording for cluster SLOT-STATS. Signed-off-by: Kyle Kim <[email protected]> Signed-off-by: Ping Xie <[email protected]>
…valkey-io#822) The cluster configuration file is the metadata "database" for the cluster. It is best to trigger a save when shutdown the server, to avoid inconsistent content that is not refreshed. We save the nodes.conf whenever something that affects the nodes.conf has changed. But we are saving nodes.conf in clusterBeforeSleep, and some events may save it without a fsync, there is a time gap. And shutdown has its own save seems good to me, it doesn't need to care about the others. At the same time, a comment is added in unlock nodes.conf to explain why we actively unlock when shutdown. Signed-off-by: Binbin <[email protected]> Signed-off-by: Ping Xie <[email protected]>
Print the full client info by using catClientInfoString, the info is useful when we want to identify the source of request. Signed-off-by: Binbin <[email protected]> Signed-off-by: Ping Xie <[email protected]>
…updated in `clusterProcessGossipSection` (valkey-io#965) `clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary. Signed-off-by: Ping Xie <[email protected]>
Minor cleanup, introduced in valkey-io#877. Signed-off-by: Binbin <[email protected]> Signed-off-by: Ping Xie <[email protected]>
…ation (valkey-io#989) Prior to comparing the replica buffer against the configured limit, we need to ensure that the limit configuration is enabled. If the limit is set to zero, it indicates that there is no limit, and we should skip the buffer limit check. --------- Signed-off-by: naglera <[email protected]> Signed-off-by: Ping Xie <[email protected]>
valkey-io#886) If we modify aof-use-rdb-preamble in the middle of rewrite, we may get a wrong aof base suffix. This is because the suffix is concatenated by the main process afterwards, and it may be different from the beginning. We cache this value when we start the rewrite. Signed-off-by: Binbin <[email protected]> Signed-off-by: Ping Xie <[email protected]>
In RdbLoad, we disable AOF before emptyData and rdbLoad to prevent copy-on-write issues. After rdbLoad completes, AOF should be re-enabled, but the code incorrectly checks server.aof_state, which has been reset to AOF_OFF in stopAppendOnly. This leads to AOF not being re-enabled after being disabled. --------- Signed-off-by: Binbin <[email protected]> Signed-off-by: Ping Xie <[email protected]>
Signed-off-by: zhaozhao.zz <[email protected]> Signed-off-by: Ping Xie <[email protected]>
Fix for valkey-io#997 Root Cause Analysis: 1. Two different jobs (READ and WRITE) may be sent to the same IO thread. 2. When processing the read job in `processIOThreadsReadDone`, the IO thread may find that the write job has also been completed. 3. In this case, the IO thread calls `processClientIOWriteDone` to first process the completed write job and free the COBs https://github.com/valkey-io/valkey/blob/affbea5dc1ae1a0d80019c4f313d2bf1c3fcb7f9/src/networking.c#L4666 4. If there are pending writes (resulting from pipeline commands), a new async IO write job is sent before processing the completed read job https://github.com/valkey-io/valkey/blob/affbea5dc1ae1a0d80019c4f313d2bf1c3fcb7f9/src/networking.c#L2417 When sending the write job, the `TLS_CONN_FLAG_POSTPONE_UPDATE_STATE` flag is set to prevent the IO thread from updating the event loop, which is not thread-safe. 5. Upon resuming the read job processing, the flag is cleared, https://github.com/valkey-io/valkey/blob/affbea5dc1ae1a0d80019c4f313d2bf1c3fcb7f9/src/networking.c#L4685 causing the IO thread to update the event loop. Fix: Prevent sending async write job for pending writes when a read job is about to be processed. Testing: The issue could not be reproduced due to its rare occurrence, which requires multiple specific conditions to align simultaneously. Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: Ping Xie <[email protected]>
Maybe partially resolves valkey-io#952. The hostnames test relies on an assumption that node zero and node six don't communicate with each other to test a bunch of behavior in the handshake stake. This was done by previously dropping all meet packets, however it seems like there was some case where node zero was sending a single pong message to node 6, which was partially initializing the state. I couldn't track down why this happened, but I adjusted the test to simply pause node zero which also correctly emulates the state we want to be in since we're just testing state on node 6, and removes the chance of errant messages. The test was failing about 5% of the time locally, and I wasn't able to reproduce a failure with this new configuration. --------- Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Ping Xie <[email protected]>
…valkey-io#995) While doing some profiling, I noticed that getKeySlot() was a fairly large part (~0.7%) of samples doing perf with high pipeline during standalone. I think this is because we do a very late check for server.cluster_mode, we first call getKeySlot() and then call calculateKeySlot(). (calculateKeySlot was surprisingly not automatically inlined, we were doing a jump into it and then immediately returning zero). We then also do useless work in the form of caching zero in client->slot, which will further mess with cache lines. So, this PR tries to accomplish a few things things. 1) The usage of the `slot` name made a lot more sense before the introduction of the kvstore. Now with kvstore, we call this the database index, so all the references to slot in standalone are no longer really accurate. 2) Pull the cluster mode check all the way out of getKeySlot(), so hopefully a bit more performant. 3) Remove calculateKeySlot() as independent from getKeySlot(). calculateKeySlot used to have 3 call sites outside of db.c, which warranted it's own function. It's now only called in two places, pubsub.c and networking.c. I ran some profiling, and saw about ~0.3% improvement, but don't really trust it because you'll see a much higher (~2%) variance in test runs just by how the branch predictions will get changed with a new memory layout. Running perf again showed no samples in getKeySlot() and a reduction in samples in lookupKey(), so maybe this will help a little bit. --------- Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Ping Xie <[email protected]>
…lkey-io#1025) Update client help output message for new command: client capa redirect --------- Signed-off-by: hwware <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Signed-off-by: Ping Xie <[email protected]>
The node may not be able to initiate an election in time due to problems with cluster communication. If an election is initiated, make sure its offset is 0. Closes valkey-io#967. Signed-off-by: Binbin <[email protected]> Signed-off-by: Ping Xie <[email protected]>
@valkey-io/core-team PTAL. Will prepare the release notes after merging this PR. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 8.0 #1030 +/- ##
==========================================
- Coverage 70.61% 70.58% -0.03%
==========================================
Files 114 114
Lines 61640 61665 +25
==========================================
+ Hits 43528 43529 +1
- Misses 18112 18136 +24
|
PingXie
added
the
run-extra-tests
Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
label
Sep 13, 2024
… in advance (valkey-io#981) Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas There is a timing bug where the primary and replica have different `cluster-allow-replica-migration` settings. In issue valkey-io#970, we found that if the replica receives `CLUSTER SETSLOT` before the gossip update, it remains in the original shard. This happens because we only process the `cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`. This commit fixes the issue by also evaluating this flag for replicas in the `CLUSTER SETSLOT` path, ensuring correct replica migration behavior. Closes valkey-io#970 --------- Signed-off-by: Binbin <[email protected]> Co-authored-by: Ping Xie <[email protected]>
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
run-extra-tests
Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.