Skip to content

Fix: SQLite "too many SQL variables" error in Importer #97

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pixelastronauts
Copy link
Contributor

When importing large datasets with SQLite, the process fails with too many SQL variables error because SQLite has a variable limit in queries.

This PR chunks the items into smaller batches.

Tested with 5,000+ row XML import - successfully processes without errors.

Fixes #96

…batches of 500 for improved performance during import operations.
src/Importer.php Outdated

foreach ($chunks as $chunk) {
Bus::batch($chunk->map(fn (array $item) => new ImportItemJob($import, $item)))
->before(fn (Batch $batch) => $import->batchId($batch->id)->save())
->finally(function (Batch $batch) use ($import) {
if ($import->get('destination.type') === 'entries') {
UpdateCollectionTreeJob::dispatch($import);
Copy link
Member

@duncanmcclean duncanmcclean Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pull request!

However, I think this might cause an issue with the UpdateCollectionTreeJob job which needs to be run after everything has been imported, not after 500 of them have been imported. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Maybe we can do something like this then:

        $items = match ($import->get('type')) {
            'csv' => (new Csv($import))->getItems($import->get('path')),
            'xml' => (new Xml($import))->getItems($import->get('path')),
        };

        $chunks = $items->chunk(500);
        $pendingBatches = [];

        foreach ($chunks as $chunk) {
            $pendingBatch = Bus::batch($chunk->map(fn (array $item) => new ImportItemJob($import, $item)))
                ->before(fn (Batch $batch) => $import->batchId($batch->id)->save());
            
            $pendingBatches[] = $pendingBatch;
            $pendingBatch->dispatch();
        }

        // Only dispatch the UpdateCollectionTreeJob after the last batch
        if (count($pendingBatches) > 0 && $import->get('destination.type') === 'entries') {
            $lastBatch = array_pop($pendingBatches);
            
            Bus::chain([
                function () use ($import) {
                    UpdateCollectionTreeJob::dispatch($import);
                }
            ])->afterLastBatch($lastBatch)->dispatch();
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to do something with lazy collections but I'm not sure. Will need to have a think.

This comment was marked as resolved.

This comment was marked as resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bus::chain([
    function () use ($import) {
        UpdateCollectionTreeJob::dispatch($import);
    }
])->afterLastBatch($lastBatch)->dispatch();

Unfortunately, it doesn't look like the afterLastBatch method exists (I can't find it in the Laravel codebase), otherwise this would have been a good solution.

I'll need to have a think about the best way to approach this.

Copy link
Contributor Author

@pixelastronauts pixelastronauts Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already discovered this while testing. Really thought this was a method.

See my latest commit 966b3ca which I was already using in a dev branch. I think this fixes it and only dispatches the UpdateCollectionTreeJob once, right?

Not that elegant, so maybe needs some tinkering I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think might be some edge cases with dispatching it during the last batch vs dispatching it when all jobs have been completed.

For example: it's possible that the last batch could be completed before all batches from previous batches have completed, which would mean the UpdateCollectionTreeJob would be run prematurely.

@duncanmcclean
Copy link
Member

If it's easiest, I'm happy to look into this, but I'd need to test with a large enough import. Would you be able to send the file you're importing from to [email protected]?

…es, ensuring UpdateCollectionTreeJob is dispatched only after the final batch is processed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQLite error: "too many SQL variables" when importing large datasets
2 participants