Skip to content

Question: Can changes be presented multiple times or missed? #124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bartelink opened this issue Jan 16, 2019 · 12 comments
Open

Question: Can changes be presented multiple times or missed? #124

bartelink opened this issue Jan 16, 2019 · 12 comments

Comments

@bartelink
Copy link

From the documentation, I was unable to discern the answer to the following question - I hope I'm wrong, but I feel I've spent reasonable time trying to answer the question myself. Asking here as it might form a good documentation request and/or a place to put a canonical answer...

As one scales up and down to multiple partitions, what are the guarantees provided by the ChangeFeedProcessor as a whole wrt the following:

  • not missing documents - i.e., can I trust that a split or merge will never result in me missing an insert.update ?
  • not seeing documents multiple times except when a checkpoint write is lost - i.e. if one was to checkpoint after every write, would you ever see the same documents presented to the observer callback multiple times as a result of the merge or split ?

(I'm thinking reading the integration tests and/or source will give me hints, and I'll do some testing, but it would really help me a lot if someone would be so kind as to give me an answer in advance of me doing the legwork!)

@jsmithtx
Copy link

@bartelink We ran into an issue where Cosmos had some problems and kept redelivering the same documents, so the change feed processor does appear to redeliver documents if the checkpoint doesn't write or something happens. Our solution is that we keep a running list of document ids processed in the last 5 minutes and ignore a document if it is in that list.

@bartelink
Copy link
Author

Thanks @jiffypopjr I can imagine implementing that as a workaround (and emitting a warning from my projector when that's tripped).

Wrt my overall reason for asking this question though

  • I'd like to see the documentation include a description of the cases where such repeats can arise, and the underlying reasons (can merges/splits be a reason? when that happens, how many items are we talking about)
  • I'm absolutely interested in whether there's any danger of ever missing an item

@ealsur
Copy link
Member

ealsur commented Jan 18, 2019

Technically the idea is to provide an at-least-once delivery. @mkolt can probably give more details.

There are multiple reasons for a change to be sent twice:

  • You are using Automatic checkpointing (default) and your Observer throws an Exception during ProcessChangesAsync (which halts the Observer and restarts it from the last checkpoint, which is the last iteration)
  • You have a custom Checkpoint Interval (either by time or by document count), and you restart the Host and the last checkpoint was somewhere back when your custom configuration applied
  • You use Manual Checkpointing, you control the moment when Checkpoints happen

Scenarios I'm aware of that might lead to missing items:

Scenarios that seem like you are missing items

@bartelink
Copy link
Author

Thanks, that's exactly the sort of response I was after.

You are using Automatic checkpointing (default) and your Observer throws an Exception during ProcessChangesAsync (which halts the Observer and restarts it from the last checkpoint, which is the last iteration

this seems a reasonable default. I wasn't able to discern that this is the case based on navigating intellisense and reading docs - did I miss it or should it be added somewhere?

Ideally something like what you said would make it into readme.md or somewhere else prominent

My outstanding question (which reading the code will doubtless reveal, but should not be my only way to find out) is whether interesting/dangerous/simple things happen when 2 partitions become 3 and vice versa - what's the algorithm and does it intrinsically risk duplicating items or is there an obvious simple answer as to why that's already catered for ? I'm happy to raise that as a separate ticket if you feel that makes sense and let this Issue focus on reasons for >1 delivery and its interaction with how one checkpoints

@bartelink
Copy link
Author

bartelink commented Jan 18, 2019

@ealsur Regarding the medium post - I like it and feel it needs to be linked to from the repo (I'll do the PR if you won't!). There's plenty prior art for doing such a thing in other repos - your writing these is very helpful for people considering whether CFP is a fit for their needs. I guess I eventually might have bingled ChangeFeedProcessor and hit them, but I think it s safe to assume that I'm not the only one that'll look at the github repo readme, issues and source in that order
(typos: "multiple and independent" -> "multiple, independent"; "not incurring into" -> "not incurring")

@ealsur
Copy link
Member

ealsur commented Jan 18, 2019

Thanks @bartelink! I think what you mention would be a great addition to the current README, I'll see if I can make some time and write it down 😄 (BTW, thanks for the corrections in the post! My english is still work in progress hehe)

@jkdey
Copy link

jkdey commented Mar 27, 2019

When we write to Cosmos via the Table API and listen using the ChangeFeedProcessorLibrary to increment a counter, we consistently see missed changes. We have two applications, one creating a record, and another updating it in quick succession. The amount of changes we miss goes down if there is less happening in ProcessChangesAsync and if we decrease the polling interval, but it does not go away. This is on the scale of 5000 changes.

@bartelink
Copy link
Author

@jkdey I'm asking a very different question - what you are describing is by design. The bottom line is that the changefeed always contains one instance of any one document. You'll see a document iff it has been updated to trigger it "moving beyond your cursor", which would make its position in the modification order get updated such that you 'see it again' if you happen to have traversed it already. Remember, there is nothing anywhere storing a copy of every change - you're just querying all the documents in order of when they were last touched.

@jkdey
Copy link

jkdey commented Mar 27, 2019

Hi @bartelink to clarify the situation, we appear to be consistently missing the last update to a record. Is this consistent with expected behavior?

@bartelink
Copy link
Author

That would be surprising and concerning - we make considerable use of it and have not run into such issues - the bottom line is that its just queries say WHERE LSN >= x and each doc update bears an LSN value which moves up. A repro would definitely be required, but I'd tend to assume SELECT is not broken in this instance.
However the documentation as to the guarantees is definitely light around here, and I'd definitely back a request for this to be addressed as a FAQ in this context.

@bartelink
Copy link
Author

bartelink commented Mar 27, 2019

@jkdey If you start a new subscription with a new leaseid, do you see everything you expect ? Have you tried doing a basic validation to sanity check it outside the context of your larger system ?
(Not sure how useful it is (assuming you don't know F#) but the eqx tool here uses https://github.com/jet/equinox/blob/master/src/Equinox.Projection/FeedValidator.fs, and I've not seen it report gaps under any test I've subjected it to -- having said that, this issue is me asking what to expect when I do more exotic things like do splits etc)

@jkdey
Copy link

jkdey commented Mar 27, 2019

Thank you for the suggestions @bartelink I will scrutinize what happens when a lease is started and/or try to figure out how to leverage that validator code and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants