-
-
Notifications
You must be signed in to change notification settings - Fork 36
Make the use of YStore
configurable
#430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The YStore is here to flush updates to disk, allowing to free up some memory when documents are not changed for some time. Disabling the YStore means keeping all updates in memory, which is not good for a server in terms of scalability. |
As I proposed in yesterday's server call, I think we should do our best to implement a few hardening strategies:
|
I agree with the premise of the proposal that we should have a way to disable the YStore.
I don't think this is exactly precise. Closing the document in all places actually frees up the memory, because all in-memory references are removed/flushed. The YStore is a way to persist what has been flushed to recover that history when the document is re-opened. If not tuned properly (i.e. proper trimming of history), the YStore can actually create memory issues if it reloads a document's large history when it is re-opened. I've run into the exact issue myself. The presence of the YStore creates more work for me when deploying, because I need to tune how much history should be saved based on my resource constraints. The only "knob" I have today is the document's time-to-live, which is insufficient. I've had to fork in places to put bounds on memory usage. In my case, don't need my history to persist in a YStore (i.e. I don't really care/expect to be able to do undo-redo once the document is closed and re-opened). I'd rather not replay old changes and bloat the ydoc. |
I think this is what I said. In the backend we keep the data in memory for one minute after all clients have gone, then we flush it to the YStore and free up the memory.
Not only that, it also solves the duplication issue. Imagine a client who goes offline for some time and then reconnects. If we don't have a YStore (or the Y updates in general) the content of their document will be duplicated, because the backend will just read from a file and recreate the ydoc.
That, I agree. I suggested another strategy for trimming the history here. |
Well, closing the document to flush history from sitting in memory and writing that history to ystore are two separate concerns. That's what I needed to clarify here (for myself). 🙂 You could, in principle, not have a ystore; it's not strictly necessary to clear memory. I'll mention an alternative approach below.
I see. This is definitely something I missed, so I appreciate you mentioning it here. Just to make sure I understand, here is the scenario you're describing: If all clients disconnect (e.g. goes offline), the document stays in memory for 1 minute, then writes to disk. If the client continues to edit the document while offline, it's no problem, because when they reconnect, the previous state of the document is reloaded into memory and the offline changes are replayed. If we didn't have this, the whole document would be loaded and all changes would be replayed by the client, leading to a duplicate of everything. If this is correct, I think there are another way to achieve a reasonable UX without a ystore or duplication. Specifically, I think there is an argument to be made that we don't allow the user to edit the document offline. Or, at least, when they reconnect, it doesn't replay individual changes, but asks if the user want to overwrite the document in the server with the current state of the client-side document. In other words, we assume the two documents are out-of-sync and last write wins. This is how e.g. Google docs works today. In this case, all history is flushed from the server after e.g. 1 minute of disconnection and the history is lost. The client side document is also flushed at this moment. Reconnecting is a new start of the document; nothing is written to disk and no duplicate changes are made. And the user doesn't lose much (just history), nor are they surprised, since JLab would have already told them they were disconnected from the server with a modal. In the ystore case, how would it handle cases where two client are in a document, both go offline and continue editing, then reconnect? The changes weave into the document in a possibly strange (to the human) way (thank to CRDT). I would argue, the best UX in this case is not to weave changes, but offer each client the option to overwrite the document, since the two paths have diverged. I realize this breaks the purity of the CRDT ethos, where there is no technical conflict to the ydoc; however, there is a conflict for the humans working on the document. If two clients are working offline for awhile, then come back together, you could get really strange results. In this specific case, I'd rather use a git-liked, diff-based approach bringing the two versions together, rather than relying entirely on CRDT. This would decouple the flushing of memory from writing to disk. Curious what others think. |
I know of a large deployment where jupyter-collaboration had to be reverted in large part because of the issue with YStore. Of note, We also have
|
I think that |
Problem
By default,
jupyter_server_ydoc
persists every Yjs update made by every client connected to the server. These are stored in.jupyter_ystore.db
by theSQLiteYStore
class (defined inpycrdt-websocket
). This feature allows a document's edits to be "replayed" from its origin, and also allows undo/redo history to be persisted across server restarts.However, the use of a YStore has led to a number of stability & usability issues in
jupyter_collaboration
. Specifically:By default, there is no limit to the size of the YStore in
.jupyter_ystore.db
. Even in single-user scenarios, this file can grow large.The rate at which the DB file grows is proportional to the number of active users per server, as that determines the rate of incoming Yjs updates. This means that if a server has 10 users, the DB file will grow 10x as fast relative to the single-user case.
When RTC is enabled, it can take a long time to open a file for the first time after starting a server. This is because when a file is opened for the first time,
jupyter_server_ydoc
must query a ~10-100MB SQLite file, gather every single Yjs update for that file, then replay all those updates in-memory. This is documented in an upstream issue: [New Issue] When using pycrdt-websocket 0.13.1, file access in jupyterlab can hang y-crdt/pycrdt-websocket#41Finally, the YStore implementation adds a lot of complexity, which increases the risk of usability bugs. For example, @andrii-i helped identify that notebook and lab can't open in RTC #390 was being caused by the YStore database migration logic not working.
cc @krassowski @afshin @andrii-i @3coins @Zsailer
Proposed Solution
Add a configurable, boolean traitlet which enables/disables the use of
YStore
.(contentious) This configurable should default to
False
, i.e.YStore
should not be used by default.I proposed taking these actions at the JupyterLab Frontends call on 01/15. While we seemed to agree about point 1), we were not able to reach a consensus on point 2).
I argue that the risk & complexity introduced by YStore are not worth the features it enables. Others argue that those features are important and that the real solution is to make
YStore
more robust.SQLiteYStore
implementation. Several of these bugs have existed for years, and will likely take months/years to fix. Asking users to just wait for more reliability isn't good for users. We should also keep in mind that Jupyter AI v3 depends on Jupyter Collaboration, and that v3.0.0 official will be published in the next 2-4 months.Additional context
Please note: I will likely not have the time to respond to every comment on this issue. I need to focus full-time on Jupyter AI, and I am leaving for a family vacation from 01/24 to 02/02. @3coins and @andrii-i, please engage in this thread if I'm not available.
To limit the size of the DB file today, you can configure
SQLiteYStore.document_ttl
to some integerX
. This automatically squashes the Yjs updates for a file if no updates have been received in the lastX
minutes. However, this feature is undocumented and requires manual intervention. I'm also not sure how confident we can be in the correctness of its implementation given that it's used so infrequently.The text was updated successfully, but these errors were encountered: