Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: deadlock when sending ping #193

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions test-harness/tests/poll_api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ fn write_deadlock() {
// We make the message to transmit large enough s.t. the "server"
// is forced to start writing (i.e. echoing) the bytes before
// having read the entire payload.
let msg = vec![1u8; 1024 * 1024];
let msg = vec![1u8; 100024 * 100024];

// We choose a "connection capacity" that is artificially below
// the size of a receive window. If it were equal or greater,
Expand All @@ -219,7 +219,7 @@ fn write_deadlock() {
// fact that the sum of receive windows of all open streams can easily
// be larger than the send capacity of the connection at any point in time.
// Using such a low capacity here therefore yields a more reproducible test.
let capacity = 1024;
let capacity = 10024;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes seem to make the test flaky and the deadlock happens sometimes. The test output when it happens is:

running 1 test
[2024-06-11T08:40:15Z DEBUG yamux::connection] new connection: 6405a268 (Server)
[2024-06-11T08:40:15Z DEBUG yamux::connection] new connection: d58957af (Client)
[2024-06-11T08:40:15Z DEBUG yamux::connection] d58957af: new outbound (Stream d58957af/1) of (Connection d58957af Client (streams 0))
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] sending ping 3047298023
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] sending ping 3182731544
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] received pong 3047298023, estimated round-trip-time 322.416µs
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] received pong 3182731544, estimated round-trip-time 564.167µs
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.25 mb, new window_max: 0.5 mb
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.25 mb, new window_max: 0.5 mb
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.5 mb, new window_max: 1 mb
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] sending ping 3157641594
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] sending ping 1136718128
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] received pong 3157641594, estimated round-trip-time 459.708µs
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] received pong 1136718128, estimated round-trip-time 145.75µs
[2024-06-11T08:40:25Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.5 mb, new window_max: 1 mb
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] sending ping 1970012150
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] sending ping 2852812461
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] received pong 1970012150, estimated round-trip-time 180.5µs
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] received pong 2852812461, estimated round-trip-time 87.458µs
[2024-06-11T08:40:45Z DEBUG yamux::connection::rtt] sending ping 2531287629
[2024-06-11T08:40:45Z DEBUG yamux::connection::rtt] sending ping 1628284672
test write_deadlock has been running for over 60 seconds

Would it also be possible to achieve the same using smaller numbers?

Copy link
Author

@diegomrsantos diegomrsantos Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using

let msg = vec![1u8; 30 * 1024 * 1024];
let capacity = 102;
pub const DEFAULT_CREDIT: u32 = 25600;
const PING_INTERVAL: Duration = Duration::from_millis(500);

I can usually reproduce it at least once when running 5 times. But not when capacity is 100. Maybe it can give some idea about the cause.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I override DEFAULT_CREDIT and PING_INTERVAL for this test? I changed the values where they are defined as I didn't know how to override them in the test.


// Create a bounded channel representing the underlying "connection".
// Each endpoint gets a name and a bounded capacity for its outbound
Expand Down
16 changes: 8 additions & 8 deletions yamux/src/connection.rs
Original file line number Diff line number Diff line change
Expand Up @@ -391,14 +391,6 @@ impl<T: AsyncRead + AsyncWrite + Unpin> Active<T> {
fn poll(&mut self, cx: &mut Context<'_>) -> Poll<Result<Stream>> {
loop {
if self.socket.poll_ready_unpin(cx).is_ready() {
// Note `next_ping` does not register a waker and thus if not called regularly (idle
// connection) no ping is sent. This is deliberate as an idle connection does not
// need RTT measurements to increase its stream receive window.
if let Some(frame) = self.rtt.next_ping() {
self.socket.start_send_unpin(frame.into())?;
continue;
}

// Privilege pending `Pong` and `GoAway` `Frame`s
// over `Frame`s from the receivers.
if let Some(frame) = self
Expand All @@ -409,6 +401,14 @@ impl<T: AsyncRead + AsyncWrite + Unpin> Active<T> {
self.socket.start_send_unpin(frame)?;
continue;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this order seem to fix the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it deadlock on yamux test too? If not, maybe we could add a test for this too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate? I understand this test was designed specifically to address this type of deadlock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually missed the test that was there so my comment can be ignored :)

// Note `next_ping` does not register a waker and thus if not called regularly (idle
// connection) no ping is sent. This is deliberate as an idle connection does not
// need RTT measurements to increase its stream receive window.
if let Some(frame) = self.rtt.next_ping() {
self.socket.start_send_unpin(frame.into())?;
continue;
}
}

match self.socket.poll_flush_unpin(cx)? {
Expand Down