Add TLS session resumption via SSLSessionCache#789
Conversation
Such claims would ideally be supported by benchmarks. Could you try to create some? |
That's the goal, but you're right, I don't have any tests to prove that, removed this claim from the PR description. If I manage to create proper benchmarks I will update on that |
|
We could, if it helps, only support this for TLS 1.3. |
7281340 to
4500773
Compare
|
@dkropachev @Lorak-mmk I pushed changes with improvement from older Dmitry's PR, will update PR description soon |
dkropachev
left a comment
There was a problem hiding this comment.
I rechecked the TLS session-resumption path against the current branch. The ssl_options configuration still builds a fresh SSLContext per Connection, and a cached stdlib session from the previous connection is incompatible with that new context. I reproduced the failure locally on Python 3.10.12; the session restore path raises ValueError: Session refers to a different SSLContext. Since the new code only catches AttributeError and ssl.SSLError, reconnects fail instead of falling back to a full handshake, and the regression is enabled by default because Cluster auto-creates SSLSessionCache for ssl_options.
4500773 to
d12db4a
Compare
dkropachev
left a comment
There was a problem hiding this comment.
Two blocking issues from local validation:
- Twisted caches a TLS session even after hostname verification has already failed, which lets an untrusted peer populate the resumption cache.
SSLSessionCacheacceptsmax_size <= 0and then crashes on the first insert (KeyErrorfrompopitem()on an emptyOrderedDict).
| transport = connection.get_app_data() | ||
| transport.failVerification(Failure(ConnectionException("Hostname verification failed", self.endpoint))) | ||
| # Store TLS session after successful handshake (PyOpenSSL) | ||
| if self.ssl_session_cache is not None: |
There was a problem hiding this comment.
failVerification() should short-circuit this callback. As written, a hostname mismatch still falls through and caches the just-negotiated session, so an untrusted peer can seed the resumption cache. I reproduced this locally with a mocked _SSLCreator: failVerification was called and the session still landed in SSLSessionCache.
| self._sessions.move_to_end(key) | ||
| return | ||
|
|
||
| if len(self._sessions) >= self._max_size: |
There was a problem hiding this comment.
SSLSessionCache(max_size=0) currently crashes on the first insert: len(self._sessions) >= self._max_size is already true for an empty cache, so popitem(last=False) raises KeyError. Since this is now a public tuning knob, please validate max_size > 0 (and probably ttl > 0) or define zero as a disabled cache, and cover it with a unit test.
d12db4a to
f8eb94d
Compare
dkropachev
left a comment
There was a problem hiding this comment.
Two correctness issues need attention before this lands: the PyOpenSSL TLS 1.3 cache point is too early to capture the resumable session, and the cache can evict a live entry while expired ones remain resident.
| # Store TLS session after successful handshake (PyOpenSSL) | ||
| if self.ssl_session_cache is not None: | ||
| try: | ||
| session = connection.get_session() | ||
| if session: | ||
| self.ssl_session_cache.set( | ||
| self.endpoint.tls_session_cache_key, session) |
There was a problem hiding this comment.
For the PyOpenSSL reactors this is still too early for TLS 1.3. At SSL_CB_HANDSHAKE_DONE, connection.get_session() is the pre-ticket session; OpenSSL swaps in the resumable one only after the first application read. A safer fix is to keep info_callback() for hostname verification only, then cache from the live OpenSSL.SSL.Connection after the first CQL response (ReadyMessage / AuthSuccessMessage), mirroring the late cache point already used in Connection._handle_startup_response() and _handle_auth_response(). The same timing change is needed in EventletConnection, where _cache_pyopenssl_session() currently runs immediately after do_handshake().
| if len(self._sessions) >= self._max_size: | ||
| self._sessions.popitem(last=False) |
There was a problem hiding this comment.
This evicts the current LRU entry before giving expired entries a chance to fall out. If the cache fills up between periodic cleanups, an expired session can stay resident while a still-valid session gets dropped, which wastes capacity and reduces the resumption hit rate under small max_size / short ttl settings.
Introduce SSLSessionCache in connection.py: a thread-safe OrderedDict-based cache with LRU eviction (max_size, default 100) and TTL expiration (default 3600s), keyed by endpoint tls_session_cache_key. Add tls_session_cache_key property to all EndPoint subclasses: - DefaultEndPoint: (address, port) - SniEndPoint: (address, port, server_name) — prevents proxy collisions - UnixSocketEndPoint: (unix_socket_path,) - ClientRoutesEndPoint: (host_id, address, port) Includes unit tests for basic ops, key isolation, SNI keys, overwrite, thread safety, TTL expiration, LRU eviction, clear/clear_expired, automatic cleanup, custom parameters, and endpoint cache key tests.
f8eb94d to
08cabfd
Compare
- Add _ssl_session_cache attribute on Connection, set via ssl_session_cache param - Restore cached TLS sessions in _wrap_socket_from_context with error tolerance - Add _cache_tls_session_if_needed helper (delegates to endpoint.tls_session_cache_key) - Cache sessions at 3 points: after connect, ReadyMessage, AuthSuccessMessage (handles TLS 1.3 async ticket delivery) - Add TestConnectionSSLSessionRestore and TestConnectionCacheTLSSession tests
- Import SSLSessionCache in cluster.py - Add ssl_session_cache attribute with comprehensive docstring - Add ssl_session_cache parameter to Cluster.__init__ (default _NOT_SET) - Auto-create SSLSessionCache when ssl_context or ssl_options are set - Pass ssl_session_cache to connection factory via _make_connection_kwargs - Add TestSSLSessionCacheAutoCreation tests (6 tests)
- EventletConnection: restore cached session before handshake via set_session() - TwistedConnection: pass ssl_session_cache to _SSLCreator, restore cached session in clientConnectionForTLS() - Both reactors: defer session storage to _cache_tls_session_if_needed() override called at ReadyMessage / AuthSuccessMessage time, ensuring TLS 1.3 session tickets (which arrive after the first application-data exchange) are captured - Skip caching when session_reused() is True (abbreviated handshake) - All operations wrapped in try/except for error tolerance - Debug logging for session reuse and restore/store failures
Tests TLS ticket resumption end-to-end using a dynamically generated CA + server certificate pair. The test spins up a single-node CCM cluster configured for TLS, opens multiple connections, and verifies that subsequent connections reuse the TLS session rather than performing a full handshake. Skips automatically when the Scylla CCM node does not support server-side TLS session resumption (i.e. does not echo the session ticket back on reconnect).
08cabfd to
5a713f1
Compare
Summary
This PR implements TLS session resumption for the Python driver. After the first
successful TLS handshake with a node, the negotiated session is stored in a
thread-safe cache and reused on subsequent connections, skipping the full
handshake.
Both TLS 1.2 (session IDs) and TLS 1.3 (session tickets / PSK) are supported.
Changes
cassandra/connection.py—SSLSessionCacheclass & endpoint keys_SessionCacheEntrynamedtuple stores(session, timestamp)for TTL tracking.SSLSessionCache: a thread-safeOrderedDict-based cache with LRU eviction,TTL expiration, and periodic cleanup (every 100
set()calls), keyed byendpoint
tls_session_cache_key.max_size(default 100) andttl(default 3600 s).EndPointclass provides a defaulttls_session_cache_keypropertyreturning
(address, port). Subclasses override for context-specific keys:DefaultEndPoint:(address, port)— inherits defaultSniEndPoint:(address, port, server_name)— prevents proxy collisionsUnixSocketEndPoint:(unix_socket_path,)ClientRoutesEndPoint:(host_id, address, port)cassandra/connection.py—ConnectionwiringConnectiongains_ssl_session_cacheattribute, set viassl_session_cachekwarg in
__init__._wrap_socket_from_context()restores a cached session viassl_sock.session = ...afterwrap_socket(); gracefully handlesssl.SSLError/AttributeErrorif the server rejects the session._ssl_session_cache_key()helper delegates toendpoint.tls_session_cache_key._cache_tls_session_if_needed()storessocket.sessionin the cache whenssl_contextis set and the session is non-None._initiate_connection()in_connect_socket()— TLS 1.2 sessionsare available immediately after connect.
ReadyMessagein_handle_startup_response()— TLS 1.3 ticketsarrive asynchronously after the first application-data exchange.
AuthSuccessMessagein_handle_auth_response()— same TLS 1.3coverage for authenticated connections.
cassandra/cluster.py—ClusterintegrationSSLSessionCache.ssl_session_cacheclass attribute with docstring.__init__acceptsssl_session_cache=_NOT_SETparameter.SSLSessionCache()whenssl_contextorssl_optionsareset; no configuration required for the common case.
ssl_session_cache=Noneexplicitly to opt out.SSLSessionCache(max_size=…, ttl=…)can be supplied._make_connection_kwargs()passes the cache to everyConnectionviakwargs_dict.setdefault('ssl_session_cache', self.ssl_session_cache).cassandra/io/eventletreactor.py— Eventlet (PyOpenSSL) support_wrap_socket_from_context()restores cached PyOpenSSL sessions viaset_session()before the handshake._initiate_connection()calls_cache_pyopenssl_session()afterdo_handshake()._cache_pyopenssl_session()helper stores the session viaget_session(), logs whether the session was reused(
session_reused()), and catches all exceptions silently.cassandra/io/twistedreactor.py— Twisted (PyOpenSSL) support_SSLCreator.__init__accepts an optionalssl_session_cacheparameter.clientConnectionForTLS()restores cached sessions viaset_session().info_callback()stores sessions afterSSL_CB_HANDSHAKE_DONEviaget_session(), logs reuse status.TwistedConnection.add_connection()passesssl_session_cache=self._ssl_session_cacheto
_SSLCreator.Tests
tests/unit/test_connection.pyTestSSLSessionCache— empty lookup, set/get, key isolation byaddress/port/SNI, overwrite, thread safety, TTL expiration, LRU eviction,
max_size enforcement,
clear(),clear_expired(), automatic periodiccleanup,
Nonesession handling.TestEndPointTLSSessionCacheKey— cache key correctness forDefaultEndPoint,SniEndPoint,UnixSocketEndPoint,ClientRoutesEndPoint, plus isolation between different paths/addresses.TestConnectionSSLSessionRestore— session restore from cache,tolerance when cache is
None,ssl.SSLErroronsessionsetter,SNI-specific cached session lookup.
TestConnectionCacheTLSSession— session stored after connect,no-op when
session=None, no-op whencache=None, no-op whenssl_context=None, SNI-specific key used for storage.tests/unit/test_cluster.pyTestSSLSessionCacheAutoCreation— auto-create withssl_context,auto-create with
ssl_options, no cache without TLS, explicitNoneopt-out, custom cache injection, cache passed to
connection_factory.Fixes: https://scylladb.atlassian.net/browse/DRIVER-165
Pre-review checklist
./docs/source/.Fixes:annotations to PR description.