fix: host race condition on hci acl data packets#935
Conversation
2c3cfea to
4864e7e
Compare
| self.pairing_io_capability_provider = None # Classic only | ||
| self.snooper: Snooper | None = None | ||
| self._pending_acl: dict[int, list[tuple[float, hci.HCI_AclDataPacket]]] = {} | ||
| self._cleanup_task: asyncio.Task[None] | None = None |
There was a problem hiding this comment.
It's probably not necessary to have an async cleanup task here. It should be sufficient to queue the packets that arrive before the connection event, and remove "stale" packets in the queue at that time. The queue will get emptied when the connection event is handled (after delivering the packets).
| else acl_timed_out | ||
| ).append((ts, packet)) | ||
| for ts, _ in acl_timed_out: | ||
| logger.info( |
There was a problem hiding this comment.
Logging for each stale packet seems overly verbose (definitely too verbose at info level)
| for handle in list(self._pending_acl): | ||
| acl_to_keep: list[tuple[float, hci.HCI_AclDataPacket]] = [] | ||
| acl_timed_out: list[tuple[float, hci.HCI_AclDataPacket]] = [] | ||
| for ts, packet in self._pending_acl[handle]: |
There was a problem hiding this comment.
Keeping a list of stale packets seems unnecessary. A simple list comprehension with a condition, or a filter, should be enough.
|
|
||
| async def _drain_buffered_acl(self, handle: int) -> None: | ||
| """Replay all buffered ACL packet""" | ||
| await asyncio.sleep(0.1) |
There was a problem hiding this comment.
What is the purpose of this?
| queued = self._pending_acl.pop(handle, []) | ||
| if not queued: | ||
| return | ||
| for ts, packet in queued: |
There was a problem hiding this comment.
It might be a good idea to insert a sleep(0) to allow other tasks to do some work in between two packets (in a normal situation, there should be some task switching between ACL packet arrivals)
After some analysis, we found that we had some race condition on acl data packets which arrived before the connection complete
This PR add a fix for the race condition where ACL data can arrive before the corresponding Connection Complete event.
Use a buffer to store packets by handle and replaying them once the connection is established.
And buffered packets are removed after a delay of 100ms
Analysis:
When a BLE central connects to the nRF52840 dongle and immediately sends an SMP Pairing Request, the ACL data packet is delivered to Bumble before the LE Connection Complete event due to USB endpoint ordering, causing Bumble to drop the pairing request for an unknown connection handle and leaving the pairing stuck until the connection times out.