Skip to content

fix(libev): guard handle_read/handle_write against close() race condition#889

Open
vponomaryov wants to merge 1 commit into
scylladb:masterfrom
vponomaryov:fix-race-issue-614
Open

fix(libev): guard handle_read/handle_write against close() race condition#889
vponomaryov wants to merge 1 commit into
scylladb:masterfrom
vponomaryov:fix-race-issue-614

Conversation

@vponomaryov
Copy link
Copy Markdown

When close() is called from one thread, it sets is_closed=True and closes the socket immediately. However, libev watchers are stopped asynchronously in _loop_will_run(), so handle_read()/handle_write() can still fire on the now-closed fd, causing EBADF errors that surface as ConnectionShutdown('Bad file descriptor') and prevent reconnection.

So, fix it applying following changes:

  • Early-return guards at the top of handle_read() and handle_write() that check is_closed/is_defunct before touching the socket
  • Secondary is_closed/is_defunct checks in error handlers to catch the race when close() happens between watcher dispatch and syscall
  • Peer disconnect detection (EBADF, ECONNRESET, ENOTCONN, etc.) that calls close() cleanly instead of defunct()
  • last_error preservation in close() when connected_event is unset, preventing factory() from returning dead connections

Fixes: #614

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

…tion

When close() is called from one thread, it sets is_closed=True and
closes the socket immediately. However, libev watchers are stopped
asynchronously in _loop_will_run(), so handle_read()/handle_write()
can still fire on the now-closed fd, causing EBADF errors that
surface as ConnectionShutdown('Bad file descriptor') and prevent
reconnection.

So, fix it applying following changes:
- Early-return guards at the top of handle_read() and handle_write()
  that check is_closed/is_defunct before touching the socket
- Secondary is_closed/is_defunct checks in error handlers to catch
  the race when close() happens between watcher dispatch and syscall
- Peer disconnect detection (EBADF, ECONNRESET, ENOTCONN, etc.)
  that calls close() cleanly instead of defunct()
- last_error preservation in close() when connected_event is unset,
  preventing factory() from returning dead connections

Fixes: scylladb#614
@vponomaryov
Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Driver reported "[Errno 9] Bad file descriptor"

1 participant