-
Notifications
You must be signed in to change notification settings - Fork 4
Fix: SQLite "too many SQL variables" error in Importer #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…batches of 500 for improved performance during import operations.
src/Importer.php
Outdated
|
||
foreach ($chunks as $chunk) { | ||
Bus::batch($chunk->map(fn (array $item) => new ImportItemJob($import, $item))) | ||
->before(fn (Batch $batch) => $import->batchId($batch->id)->save()) | ||
->finally(function (Batch $batch) use ($import) { | ||
if ($import->get('destination.type') === 'entries') { | ||
UpdateCollectionTreeJob::dispatch($import); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pull request!
However, I think this might cause an issue with the UpdateCollectionTreeJob
job which needs to be run after everything has been imported, not after 500 of them have been imported. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Maybe we can do something like this then:
$items = match ($import->get('type')) {
'csv' => (new Csv($import))->getItems($import->get('path')),
'xml' => (new Xml($import))->getItems($import->get('path')),
};
$chunks = $items->chunk(500);
$pendingBatches = [];
foreach ($chunks as $chunk) {
$pendingBatch = Bus::batch($chunk->map(fn (array $item) => new ImportItemJob($import, $item)))
->before(fn (Batch $batch) => $import->batchId($batch->id)->save());
$pendingBatches[] = $pendingBatch;
$pendingBatch->dispatch();
}
// Only dispatch the UpdateCollectionTreeJob after the last batch
if (count($pendingBatches) > 0 && $import->get('destination.type') === 'entries') {
$lastBatch = array_pop($pendingBatches);
Bus::chain([
function () use ($import) {
UpdateCollectionTreeJob::dispatch($import);
}
])->afterLastBatch($lastBatch)->dispatch();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to do something with lazy collections but I'm not sure. Will need to have a think.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bus::chain([
function () use ($import) {
UpdateCollectionTreeJob::dispatch($import);
}
])->afterLastBatch($lastBatch)->dispatch();
Unfortunately, it doesn't look like the afterLastBatch
method exists (I can't find it in the Laravel codebase), otherwise this would have been a good solution.
I'll need to have a think about the best way to approach this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already discovered this while testing. Really thought this was a method.
See my latest commit 966b3ca which I was already using in a dev branch. I think this fixes it and only dispatches the UpdateCollectionTreeJob once, right?
Not that elegant, so maybe needs some tinkering I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I think might be some edge cases with dispatching it during the last batch vs dispatching it when all jobs have been completed.
For example: it's possible that the last batch could be completed before all batches from previous batches have completed, which would mean the UpdateCollectionTreeJob
would be run prematurely.
If it's easiest, I'm happy to look into this, but I'd need to test with a large enough import. Would you be able to send the file you're importing from to [email protected]? |
…es, ensuring UpdateCollectionTreeJob is dispatched only after the final batch is processed.
When importing large datasets with SQLite, the process fails with too many SQL variables error because SQLite has a variable limit in queries.
This PR chunks the items into smaller batches.
Tested with 5,000+ row XML import - successfully processes without errors.
Fixes #96