Skip to content

Troubleshooting GetEstimatedRemainingWorkPerPartitionAsync #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bartelink opened this issue Jan 16, 2019 · 8 comments
Closed

Troubleshooting GetEstimatedRemainingWorkPerPartitionAsync #126

bartelink opened this issue Jan 16, 2019 · 8 comments

Comments

@bartelink
Copy link

I'm periodically using GetEstimatedRemainingWorkPerPartitionAsync to establish the lag my lease has (especially relevant to #123) versus the present write position.

I have read access to a closed source implementation of code that successfully uses this correctly, and it both fits the bill and seems to work well.

Problem: Using very similar code (I'll add a link in due course, but suffice to say the implementation is pretty boring (which absolutely does not rule out a very simple case of PEBKAC)), I'm getting an empty array of responses, and not seeing any clues as to the reasons why this might be failing in the LibLog emissions.

Question: (which I can and will solve by poring over the source if push comes to shove) how can this happen - what things need to be in place ? is it possible that some accounts do not have the changefeed correctly configured ? Is it reasonable to expect either an error message in the log or >=1 entry in the response array per time I call ?

Any advice wrt what to expect in terms of failures cases wrt this API would be appreciated.

@ealsur
Copy link
Member

ealsur commented Jan 18, 2019

I'm having trouble understanding the issue. Do you mean that the Estimator call is returning an invalid value or that, if you try to implement a similar code, your code is getting empty results?

Change Feed is enabled in all accounts.

@bartelink
Copy link
Author

Can you double confirm that there's no modifier switches that might alter the behavior I'd see on a given CosmosDb account? (The changefeed is working in all other respects; I'm not expecting it to be that high level a thing as an changefeed on/off switch - I'm just trying to rule out the potential that it's working in the other account due to some privilege or mode bestowed on that specific account by the setting of a specific flag and/or that this account is missing a modifier)

In my code, the call always returns [].

From looking in the impl, I see the body of the method can do that in a given set of states.

Rewording the question:
a) can you see anything obvious wrong with my code which might lead to this? I copied it from somewhere else with minimal changes, and I know it works in that context
b) OR can you explain (based on your deeper knowledge of how the overall codebase works), the potential things that I might be doing wrong and/or assumptions implicit in how it's supposed to be used which would lead this to happen

The quickstart in the readme in my codebase is effectively a set of Steps To Reproduce, but I'm thinking I'm making some assumption that can likely be flushed out by you explaining Dos and Dont's as to how to ensure he estimator works.

Wild guess: I'm always supplying a leasePrefix, could there be a bug due to that?

@ealsur
Copy link
Member

ealsur commented Jan 18, 2019

@bartelink What's the status of the Leases collection when you call the estimator in your code? Was it already initialized? Does it have Leases with continuations (ContinuationToken) already set?
I'm not proficient in F# in order to debug if there are any errors or code paths that might lead to a different behavior.

@bartelink
Copy link
Author

I run the estimator in a timed loop alongside the projector - I've definitely run it with and without there being leases in there and never seen it produce results. Let me dig in to verify for sure (might be a few days though). Wrt it being F#, you can just set a breakpoint ;P - the callback definitely gets triggered (but with an empty array) and there's no reason the language should have any effect (it is as if I got the name of the projection wrong though - my first thing to check will be whether that leg of the builder honors the leasePrefix or not - if the BuildEstimator did not honor it, that would explain it).

@ealsur
Copy link
Member

ealsur commented Jan 18, 2019

I would check what's the status of the leases when the Estimator run. For example:

  1. Let the Processor run a couple of times and stop it
  2. Make some changes in the collection
  3. Run the Estimator and check results

The Estimator checks the current state of leases vs the current state of the Feed. If there are no leases, or the leases have no ContinuationToken (they have not been checkpointed ever), then the comparison would yield an empty result, because there is no beginning time to compare against.

public async Task<IReadOnlyList<RemainingPartitionWork>> GetEstimatedRemainingWorkPerPartitionAsync()

@bartelink
Copy link
Author

Makes sense. If it's alright, I'll leave this issue open and report back what the problem was when I figure it out - I won't be long (please close the issue should I unexpectedly go dark for >1 week!)

@bartelink
Copy link
Author

bartelink commented Feb 7, 2019

Don't have a full repro, but have observed that removing the LeaseId (I have a wrapper that makes it mandatory to supply one) makes it work. This implies there is some weakness or inconsistency in how the progress is derived in that way.

I observed that establishing a single lease without a prefix results in the function starting to returning values (but the values never move)

Example output (using a LeaseId where the aux collection has previously ran without a leaseId - note a) that no progress is registered for > 1h despite operating with auto-acks b) no data is returned at all until I run once without a leaseId):
image

ETA: spotting a missing WithProcessorOptions - incoming close when I verify my stupidity in full!

@bartelink
Copy link
Author

Confirmed 💯PEBKAC, thanks @ealsur for bearing with me (I was taking the code from elsewhere which was not calling WithProcessorOptions and it didn't seem logical to supply the bulk of options in there for a backlog estimator hence it took me the longest time to see from the code that the issue was my failing to propagate the LeasePrefix in the BuildEstimatorAsync call chain)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants