Question: Can changes be presented multiple times or missed? #124

bartelink · 2019-01-16T21:24:57Z

From the documentation, I was unable to discern the answer to the following question - I hope I'm wrong, but I feel I've spent reasonable time trying to answer the question myself. Asking here as it might form a good documentation request and/or a place to put a canonical answer...

As one scales up and down to multiple partitions, what are the guarantees provided by the ChangeFeedProcessor as a whole wrt the following:

not missing documents - i.e., can I trust that a split or merge will never result in me missing an insert.update ?
not seeing documents multiple times except when a checkpoint write is lost - i.e. if one was to checkpoint after every write, would you ever see the same documents presented to the observer callback multiple times as a result of the merge or split ?

(I'm thinking reading the integration tests and/or source will give me hints, and I'll do some testing, but it would really help me a lot if someone would be so kind as to give me an answer in advance of me doing the legwork!)

jsmithtx · 2019-01-18T14:36:27Z

@bartelink We ran into an issue where Cosmos had some problems and kept redelivering the same documents, so the change feed processor does appear to redeliver documents if the checkpoint doesn't write or something happens. Our solution is that we keep a running list of document ids processed in the last 5 minutes and ignore a document if it is in that list.

bartelink · 2019-01-18T14:46:02Z

Thanks @jiffypopjr I can imagine implementing that as a workaround (and emitting a warning from my projector when that's tripped).

Wrt my overall reason for asking this question though

I'd like to see the documentation include a description of the cases where such repeats can arise, and the underlying reasons (can merges/splits be a reason? when that happens, how many items are we talking about)
I'm absolutely interested in whether there's any danger of ever missing an item

ealsur · 2019-01-18T16:46:20Z

Technically the idea is to provide an at-least-once delivery. @mkolt can probably give more details.

There are multiple reasons for a change to be sent twice:

You are using Automatic checkpointing (default) and your Observer throws an Exception during ProcessChangesAsync (which halts the Observer and restarts it from the last checkpoint, which is the last iteration)
You have a custom Checkpoint Interval (either by time or by document count), and you restart the Host and the last checkpoint was somewhere back when your custom configuration applied
You use Manual Checkpointing, you control the moment when Checkpoints happen

Scenarios I'm aware of that might lead to missing items:

If you throw a DocumentClientException with substatus 429 from inside the ProcessChangesAsync ref Propagate DocumentClientException from ProcessChangesAsync #127

Scenarios that seem like you are missing items

You create 2 separate Hosts with different code implementations, both using the same monitored collection, and lease collection: The changes will be picked up by one of them, not both, hence it seems changes were lost. Solution: Use the LeasePrefix to have each particular Host maintain it's own set of leases or use a different lease collection for each (ref https://medium.com/microsoftazure/azure-cosmos-db-functions-cookbook-multi-trigger-f8938673de57)

bartelink · 2019-01-18T17:56:24Z

Thanks, that's exactly the sort of response I was after.

You are using Automatic checkpointing (default) and your Observer throws an Exception during ProcessChangesAsync (which halts the Observer and restarts it from the last checkpoint, which is the last iteration

this seems a reasonable default. I wasn't able to discern that this is the case based on navigating intellisense and reading docs - did I miss it or should it be added somewhere?

Ideally something like what you said would make it into readme.md or somewhere else prominent

My outstanding question (which reading the code will doubtless reveal, but should not be my only way to find out) is whether interesting/dangerous/simple things happen when 2 partitions become 3 and vice versa - what's the algorithm and does it intrinsically risk duplicating items or is there an obvious simple answer as to why that's already catered for ? I'm happy to raise that as a separate ticket if you feel that makes sense and let this Issue focus on reasons for >1 delivery and its interaction with how one checkpoints

bartelink · 2019-01-18T18:11:53Z

@ealsur Regarding the medium post - I like it and feel it needs to be linked to from the repo (I'll do the PR if you won't!). There's plenty prior art for doing such a thing in other repos - your writing these is very helpful for people considering whether CFP is a fit for their needs. I guess I eventually might have bingled ChangeFeedProcessor and hit them, but I think it s safe to assume that I'm not the only one that'll look at the github repo readme, issues and source in that order
(typos: "multiple and independent" -> "multiple, independent"; "not incurring into" -> "not incurring")

ealsur · 2019-01-18T18:24:41Z

Thanks @bartelink! I think what you mention would be a great addition to the current README, I'll see if I can make some time and write it down 😄 (BTW, thanks for the corrections in the post! My english is still work in progress hehe)

jkdey · 2019-03-27T00:30:46Z

When we write to Cosmos via the Table API and listen using the ChangeFeedProcessorLibrary to increment a counter, we consistently see missed changes. We have two applications, one creating a record, and another updating it in quick succession. The amount of changes we miss goes down if there is less happening in ProcessChangesAsync and if we decrease the polling interval, but it does not go away. This is on the scale of 5000 changes.

bartelink · 2019-03-27T03:05:40Z

@jkdey I'm asking a very different question - what you are describing is by design. The bottom line is that the changefeed always contains one instance of any one document. You'll see a document iff it has been updated to trigger it "moving beyond your cursor", which would make its position in the modification order get updated such that you 'see it again' if you happen to have traversed it already. Remember, there is nothing anywhere storing a copy of every change - you're just querying all the documents in order of when they were last touched.

jkdey · 2019-03-27T15:20:43Z

Hi @bartelink to clarify the situation, we appear to be consistently missing the last update to a record. Is this consistent with expected behavior?

bartelink · 2019-03-27T15:41:43Z

That would be surprising and concerning - we make considerable use of it and have not run into such issues - the bottom line is that its just queries say WHERE LSN >= x and each doc update bears an LSN value which moves up. A repro would definitely be required, but I'd tend to assume SELECT is not broken in this instance.
However the documentation as to the guarantees is definitely light around here, and I'd definitely back a request for this to be addressed as a FAQ in this context.

bartelink · 2019-03-27T15:47:27Z

@jkdey If you start a new subscription with a new leaseid, do you see everything you expect ? Have you tried doing a basic validation to sanity check it outside the context of your larger system ?
(Not sure how useful it is (assuming you don't know F#) but the eqx tool here uses https://github.com/jet/equinox/blob/master/src/Equinox.Projection/FeedValidator.fs, and I've not seen it report gaps under any test I've subjected it to -- having said that, this issue is me asking what to expect when I do more exotic things like do splits etc)

jkdey · 2019-03-27T16:02:08Z

Thank you for the suggestions @bartelink I will scrutinize what happens when a lease is started and/or try to figure out how to leverage that validator code and report back.

This was referenced Jan 16, 2019

Question: Change ordering - will the final version of a document always be presented in the case of an update? #125

Open

Add Projection library and eqx project tool-command jet/equinox#87

Merged

bartelink mentioned this issue Jan 18, 2019

Question/doc request/feature request: automatic checkpointing behavior + batch prefetch #128

Open

bartelink mentioned this issue Mar 15, 2019

CosmosDBChangeFeed occasionally pushes feed to multiple workers instead of distributing the feed #132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Can changes be presented multiple times or missed? #124

Question: Can changes be presented multiple times or missed? #124

bartelink commented Jan 16, 2019

jsmithtx commented Jan 18, 2019

bartelink commented Jan 18, 2019

ealsur commented Jan 18, 2019

bartelink commented Jan 18, 2019

bartelink commented Jan 18, 2019 •

edited

Loading

ealsur commented Jan 18, 2019

jkdey commented Mar 27, 2019

bartelink commented Mar 27, 2019

jkdey commented Mar 27, 2019

bartelink commented Mar 27, 2019

bartelink commented Mar 27, 2019 •

edited

Loading

jkdey commented Mar 27, 2019

Question: Can changes be presented multiple times or missed? #124

Question: Can changes be presented multiple times or missed? #124

Comments

bartelink commented Jan 16, 2019

jsmithtx commented Jan 18, 2019

bartelink commented Jan 18, 2019

ealsur commented Jan 18, 2019

bartelink commented Jan 18, 2019

bartelink commented Jan 18, 2019 • edited Loading

ealsur commented Jan 18, 2019

jkdey commented Mar 27, 2019

bartelink commented Mar 27, 2019

jkdey commented Mar 27, 2019

bartelink commented Mar 27, 2019

bartelink commented Mar 27, 2019 • edited Loading

jkdey commented Mar 27, 2019

bartelink commented Jan 18, 2019 •

edited

Loading

bartelink commented Mar 27, 2019 •

edited

Loading