Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix // Fix race condition in Pop & PopN operation of ring buffer #177

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mapogolions
Copy link

@mapogolions mapogolions commented Jan 6, 2025

The current implementation of the ring buffer has a thread-safety issue in the Pop & PopN operation.

Attempting to remove an element may lead to a race condition, resulting in the ring buffer being left in an invalid state after calling Pop. Similarly, in the case of PopN, the method might incorrectly indicate that elements were removed, even though no elements were actually removed because the ring buffer was empty.

To illustrate the problem, I have added unit tests that can reproduce the issue. Feel free to delete these tests once you have verified the existence of the bug, as including such tests might be questionable, given that they expose a non-deterministic problem.

@mapogolions mapogolions changed the title Bugfix // Fix race condition in pop operation of ring buffer Bugfix // Fix race condition in Pop & PopN operation of ring buffer Jan 6, 2025
@@ -61,6 +61,11 @@ func (rb *RingBuffer[T]) Pop() (T, bool) {
return t, false
}
rb.mu.Lock()
if rb.len == 0 {
Copy link
Contributor

@tprifti tprifti Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we check for the ringbuffer length twice here, 1st time using rb.Len(), and second time using rb.len

func (rb *RingBuffer[T]) Pop() (T, bool) {
	rb.mu.Lock() // lock here to avoid reading length 2 times
	if rb.Len() == 0 {
		rb.mu.Unlock()
		var t T
		return t, false
	}
	rb.content.head = (rb.content.head + 1) % rb.content.mod
	item := rb.content.items[rb.content.head]
	var t T
	rb.content.items[rb.content.head] = t
	atomic.AddInt64(&rb.len, -1)
	rb.mu.Unlock()
	return item, true
}

Copy link
Author

@mapogolions mapogolions Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's called double-checked locking pattern. There are usually two options we can use:

  1. Lock immediately:
rb.mu.Lock()
if rb.len == 0 {
	rb.mu.Unlock()
	var t T
	return t, false
}
  1. Use double-checked locking:
if rb.Len() == 0 { // As far as I understand, we read atomically to prevent tearing read of int64.
	var t T
	return t, false
}
rb.mu.Lock()
if rb.len == 0 {
	rb.mu.Unlock()
	var t T
	return t, false
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer the first option, since it's bit more readable and we do not need to check the rb length twice.

Good job for detecting the issue!

@@ -75,6 +80,10 @@ func (rb *RingBuffer[T]) PopN(n int64) ([]T, bool) {
return nil, false
}
rb.mu.Lock()
if rb.len == 0 {
Copy link
Contributor

@tprifti tprifti Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here:

func (rb *RingBuffer[T]) PopN(n int64) ([]T, bool) {
	rb.mu.Lock()
	if rb.Len() == 0 {
		rb.mu.Unlock()
		return nil, false
	}
	content := rb.content
	if n >= rb.len {
		n = rb.len
	}
	atomic.AddInt64(&rb.len, -n)

	items := make([]T, n)
	for i := int64(0); i < n; i++ {
		pos := (content.head + 1 + i) % content.mod
		items[i] = content.items[pos]
		var t T
		content.items[pos] = t
	}
	content.head = (content.head + n) % content.mod

	rb.mu.Unlock()
	return items, true
}

@tprifti
Copy link
Contributor

tprifti commented Jan 9, 2025

@mapogolions I just checked the updates. It's better to use atomic operation .Len() for reading length instead of just .len, because we need to make sure it's not written by other threads while we read it

@mapogolions
Copy link
Author

@tprifti
req.len is read after acquiring the lock, which ensures exclusive access

@tprifti
Copy link
Contributor

tprifti commented Jan 9, 2025

@tprifti req.len is read after acquiring the lock, which ensures exclusive access

You are technically correct. I suggested using atomic read so we can be consistent across the repo. Also, it would be helpful for other contributors to notice that reading the length of ringbuffer needs to be atomic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants