ForceStopAsyncSend option for graceful stop in async mode #77

JamesJJ · 2019-12-06T15:16:44Z

As I understand, existing behaviour in Async mode is:

Add events to the end of the buffer. Return an error if the buffer is full
Take events from the front of the buffer, for each event try to send to fluentd MaxRetry times, and if still not successfully sent the event is discarded
When Close() is called, the pending event channel is closed and attempting to add any new events will return an error
When Close() is called, it will block until all remaining events in the buffer have either been successfully sent, or each one has run through it's MaxRetry number of failures. This means that Close() can block for a very long time if the number of unsent events is large.

This PR introduces a new option AsyncStop. When true in Async mode and Close() is called, the current write() / MaxRetries iteration will complete for the single event in-progress, and then any remaining events in the buffer will be discarded. This may help to significantly reduce the time that Close() blocks when fluentd is continuously available.

Possible fix for When fluentd server is down and async=true, the logger will try to reconnect indefinitely #71

Signed-off-by: JamesJJ <[email protected]>

thaJeztah · 2020-02-11T10:58:30Z

ping @tagomoris @cpuguy83 ptal (looks like this relates to moby/moby#40063)

cpuguy83 · 2020-02-11T16:50:33Z

In theory this is nice, but I feel like everyone will just turn it on because close just should not block in general.
The option also doesn't seem appropriately named since it's not an async close but rather discard+close.

Perhaps something like Reset() before Close() would be a possible alternative.
But again I feel like probably everyone will just want to call Reset().

tagomoris

Basically I agree to add this feature as a new option, not to change the default behavior. IMO some people want to find processes which stuck in its shutdown sequence and resume Fluentd destination processes before terminating golang processes forcibly.

But I found some points to be improved.
@JamesJJ I'm very sorry for reviewing this change so lately, but could you check my review comments?

fluent/fluent.go

…de on Close() Signed-off-by: JamesJJ <[email protected]>

Signed-off-by: JamesJJ <[email protected]>

fluent/fluent.go

Signed-off-by: JamesJJ <[email protected]>

tagomoris · 2020-02-19T04:41:28Z

LGTM. Thank you!

tagomoris · 2020-02-19T04:51:28Z

Merged, and tagged as v1.5.0

PR fluent#77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer before closing. Without this parameter, the logger hangs out whenever the target Fluentd server is down. Also the logger is currently lazily initializing the connection when it receives its first log. But this is a problem when the Fluentd server has never been available as the connection initialization blocks the select signaling the log channel should be drained.

PR fluent#77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer before closing. Without this parameter, the logger hangs out whenever it has logs to write and the target Fluentd server is down. But this new parameter is not enough: the connection is currently lazily initialized when the logger receive its first message. This blocks the select reading messages from the queue (until the connection is ready). Moreover, the connection dialing uses an exponential back-off retry. Because of that, the logger won't look for messages on `stopRunning` channel (the channel introduced by PR fluent#77), either because it's blocked by the Sleep used for the retry or because the connection dialing is waiting for tcp timeout. To fix these edge cases, the connection isn't initialized lazily anymore. However, it's still initialized asynchronously and with the exponential back-off retry. The Fluent.run() method has been adapted to wait for either the connection to become ready or to receive a stop signal on the stopRunning channel before starting to unqueue logs.

PR fluent#77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer before closing. Without this parameter, the logger hangs out whenever it has logs to write and the target Fluentd server is down. But this new parameter is not enough: the connection is currently lazily initialized when the logger receive its first message. This blocks the select reading messages from the queue (until the connection is ready). Moreover, the connection dialing uses an exponential back-off retry. Because of that, the logger won't look for messages on `stopRunning` channel (the channel introduced by PR fluent#77), either because it's blocked by the Sleep used for the retry or because the connection dialing is waiting for tcp timeout. To fix these edge cases, the connection isn't initialized lazily anymore. However, it's still initialized asynchronously and with the exponential back-off retry. The Fluent.run() method has been adapted to wait for either the connection to become ready or to receive a stop signal on the stopRunning channel before starting to unqueue logs. This fix is motivated by the issue described in: moby/moby#40063.

PR fluent#77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer before closing. Without this parameter, the logger hangs out whenever it has logs to write and the target Fluentd server is down. But this new parameter is not enough: the connection is currently lazily initialized when the logger receive its first message. This blocks the select reading messages from the queue (until the connection is ready). Moreover, the connection dialing uses an exponential back-off retry. Because of that, the logger won't look for messages on `stopRunning` channel (the channel introduced by PR fluent#77), either because it's blocked by the Sleep used for the retry or because the connection dialing is waiting for tcp timeout. To fix these edge cases, the connection isn't initialized lazily anymore. However, it's still initialized asynchronously and with the exponential back-off retry. The Fluent.run() method has been adapted to wait for either the connection to become ready or to receive a stop signal on the stopRunning channel before starting to unqueue logs. This fix is motivated by the issue described in: moby/moby#40063. Signed-off-by: Albin Kerouanton <[email protected]>

PR fluent#77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer before closing. Without this parameter, the logger hangs out whenever it has logs to write and the target Fluentd server is down. But this new parameter is not enough: the connection is currently lazily initialized when the logger receive its first message. This blocks the select reading messages from the queue (until the connection is ready). Moreover, the connection dialing uses an exponential back-off retry. Because of that, the logger won't look for messages on `stopRunning` channel (the channel introduced by PR fluent#77), either because it's blocked by the Sleep used for the retry or because the connection dialing is waiting until dialing timeout (eg. TCP timeout). To fix these edge cases, the time.Sleep() used for back-off retry has been transformed into a time.After() and a new stopAsyncConnect channel has been introduced to stop it. Moreover, the dialer.Dial() call used to connect to Fluentd has been replaced with dialer.DialContext() and a cancellable context is now used to stop the call to that method. Finally, the Fluent.run() method has been adapted to wait for both new messages on f.pending and stop signal sent to f.stopRunning channel. Previously, both channels were selected independently, in a procedural fashion. This fix is motivated by the issue described in: moby/moby#40063. Signed-off-by: Albin Kerouanton <[email protected]>

PR fluent#77 introduced a new parameter named ForceStopAsyncSend. It can be used to tell the logger to not try to send all the log messages in its buffer before closing. Without this parameter, the logger hangs out whenever it has logs to write and the target Fluentd server is down. But this new parameter is not enough: the connection is currently lazily initialized when the logger receive its first message. This blocks the select reading messages from the queue (until the connection is ready). Moreover, the connection dialing uses an exponential back-off retry. Because of that, the logger won't look for messages on `stopRunning` channel (the channel introduced by PR fluent#77), either because it's blocked by the Sleep used for the retry or because the connection dialing is waiting until dialing timeout (eg. TCP timeout). To fix these edge cases, the time.Sleep() used for back-off retry has been transformed into a time.After(). Moreover, the dialer.Dial() call used to connect to Fluentd has been replaced with dialer.DialContext() and a cancellable context is now used to stop the call to that method. The associated cancel function is stored in Fluent and got called by Close() when ForceStopAsyncSend option is enabled. Finally, the Fluent.run() method has been adapted to wait for both new messages on f.pending and stop signal sent to f.stopRunning channel. Previously, both channels were selected independently, in a procedural fashion. This fix is motivated by the issue described in: moby/moby#40063. Signed-off-by: Albin Kerouanton <[email protected]>

JamesJJ added a commit to JamesJJ/moby that referenced this pull request Dec 6, 2019

Possible fix for moby#40063. Refer fluent/fluent-logger-golang#77

7ed5c53

Signed-off-by: JamesJJ <[email protected]>

JamesJJ mentioned this pull request Dec 6, 2019

Possible improvement for #40063 moby/moby#40293

Closed

JamesJJ changed the title ~~Possible Fix for #71. AsyncStop option for graceful stop in async mode~~ [WIP] AsyncStop option for graceful stop in async mode Dec 7, 2019

JamesJJ changed the title ~~[WIP] AsyncStop option for graceful stop in async mode~~ AsyncStop option for graceful stop in async mode Dec 7, 2019

akerouanton mentioned this pull request Dec 14, 2019

Container hangs when using fluentd logger moby/moby#40063

Closed

tagomoris requested changes Feb 12, 2020

View reviewed changes

fluent/fluent.go Outdated Show resolved Hide resolved

fluent/fluent.go Outdated Show resolved Hide resolved

fluent/fluent.go Outdated Show resolved Hide resolved

JamesJJ added 3 commits February 18, 2020 21:55

Add AsyncStop config option to facilitate graceful stop in async mo…

61880ab

…de on Close() Signed-off-by: JamesJJ <[email protected]>

Rename AsyncStop to ForceStopAsyncSend

5b04f68

Signed-off-by: JamesJJ <[email protected]>

Avoid excessive warning messages on StdErr

aacea74

Signed-off-by: JamesJJ <[email protected]>

tagomoris reviewed Feb 19, 2020

View reviewed changes

fluent/fluent.go Show resolved Hide resolved

Skip select if drainEvents

005bb40

Signed-off-by: JamesJJ <[email protected]>

tagomoris merged commit b8d8d8a into fluent:master Feb 19, 2020

tagomoris changed the title ~~AsyncStop option for graceful stop in async mode~~ ForceStopAsyncSend option for graceful stop in async mode Feb 19, 2020

akerouanton mentioned this pull request Apr 22, 2020

Properly stop logger during (re)connect failure #82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ForceStopAsyncSend option for graceful stop in async mode #77

ForceStopAsyncSend option for graceful stop in async mode #77

JamesJJ commented Dec 6, 2019 •

edited

Loading

thaJeztah commented Feb 11, 2020

cpuguy83 commented Feb 11, 2020

tagomoris left a comment

tagomoris commented Feb 19, 2020

tagomoris commented Feb 19, 2020

ForceStopAsyncSend option for graceful stop in async mode #77

ForceStopAsyncSend option for graceful stop in async mode #77

Conversation

JamesJJ commented Dec 6, 2019 • edited Loading

thaJeztah commented Feb 11, 2020

cpuguy83 commented Feb 11, 2020

tagomoris left a comment

Choose a reason for hiding this comment

tagomoris commented Feb 19, 2020

tagomoris commented Feb 19, 2020

JamesJJ commented Dec 6, 2019 •

edited

Loading