Skip to content

New client pool implementation #11736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: 4.9.x
Choose a base branch
from
Open

New client pool implementation #11736

wants to merge 24 commits into from

Conversation

yawkat
Copy link
Member

@yawkat yawkat commented Apr 10, 2025

#8100 overhauled the connection pooling, but its primary purpose was to improve the separation of concerns, and to extend connection pooling to non-exchange methods. From an efficiency standpoint, it is not very good. In particular, it uses a central "doSomeWork" method to dispatch requests onto connections, which is shared between all threads and only dispatches in a serial fashion. The connections were also shared with no regard to which event loop they were created on.

#11300 bolted on "client affinity" functionality: Requests arriving on an event loop would preferentially be dispatched to connections running on the same event loop, which can reduce context switches and improve latency. However, the actual dispatch still happens in the central doSomeWork method serially, now further weighed down by the client affinity algorithm.

Benchmarks connected to #11704 show that this model breaks down in scenarios with many connections serving many requests on multiple event loops. doSomeWork becomes a bottleneck, since only one event loop can execute it at a time, and the connection selection algorithm for client affinity makes this even heavier.

This PR introduces a new connection pool built with scalability in mind from the start. Instead of a global pool of connections, the pool is split into "local pools", one for each event loop. Requests that are created on one event loop can be dispatched purely within that loop, without coordination with other loops. Contention between loops is kept to a minimum.

If load is uneven between event loops, requests can still spill into a global request queue as a fallback, and be picked up by less busy loops. This ensures progress, but should not be a hot path. The logic for moving between event loops or into the global queue is unfortunately quite complex.

Base automatically changed from netty-4.2 to 4.9.x April 11, 2025 17:19
@yawkat yawkat added the type: improvement A minor improvement to an existing feature label Apr 14, 2025
@yawkat yawkat added this to the 4.9.0 milestone Apr 14, 2025
@yawkat yawkat marked this pull request as ready for review April 14, 2025 13:35
Copy link
Contributor

@graemerocher graemerocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are getting a new design can we use this opportunity to expose some APIs to collect http client pool metrics?

@yawkat
Copy link
Member Author

yawkat commented Apr 14, 2025

i don't really want to expose structured internals of the connection pool, it constrains the design too much. unstructured data, sure, but i don't think we have anything in core for that?

maybe we need a metrics api in core

Copy link

sonarqubecloud bot commented Apr 14, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: improvement A minor improvement to an existing feature
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants