Skip to content

Support insert_into_select for Postgres table #688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

YuweiXiao
Copy link

@YuweiXiao YuweiXiao commented Mar 21, 2025

Partially fixes #30

@YuweiXiao
Copy link
Author

hi~ @JelteF The test cases aren't fully covered yet. would appreciate some feedback before finalizing the details. Thanks!

Comment on lines 38 to 41
ListCell *l;
RangeTblEntry *select_rte = NULL;
foreach(l, copied_query->rtable) {
RangeTblEntry *rte = (RangeTblEntry *)lfirst(l);
Copy link
Collaborator

@JelteF JelteF Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreach_node is nicer to use and requires less boilerplate

Suggested change
ListCell *l;
RangeTblEntry *select_rte = NULL;
foreach(l, copied_query->rtable) {
RangeTblEntry *rte = (RangeTblEntry *)lfirst(l);
RangeTblEntry *select_rte = NULL;
foreach_node(RangeTblEntry, rte, copied_query->rtable) {

Comment on lines 186 to 189
if (query->onConflict || query->returningList) {
elog(elevel, "DuckDB does not support INSERT ON CONFLICT or RETURNING on Postgres tables");
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not allow these? Do they not work? I would expect that if we insert using postgres its functionality that thees things would automatically work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, not saying that this should be implemented in the first version of this feature. But I'd at least like a code comment explaining why we do not support it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, block them since I haven't tested them yet. I've done some initial testing, and RETURNING works out of box while onConflict doesn't. Will dig deeper into the WHY and add comments to clarify.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ON CONFLICT is also working

@@ -169,10 +182,16 @@ IsAllowedStatement(Query *query, bool throw_error) {
/* We don't support modifying statements on Postgres tables yet */
if (query->commandType != CMD_SELECT) {
RangeTblEntry *resultRte = list_nth_node(RangeTblEntry, query->rtable, query->resultRelation - 1);
if (!::IsDuckdbTable(resultRte->relid)) {
if (::IsHeapTable(resultRte->relid) && query->commandType == CMD_INSERT) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only heap tables? I'd expect this should logic would work fine for most table access method, because they implement the table access method interface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES, you are right! This is becoming much more exciting if we get the whole thing functional.

Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is failing in many places, because duckdb.force_execution is true in our tests. Apparently something breaks when selecting from generate_series() in duckdb for a PG INSERT.

@JelteF
Copy link
Collaborator

JelteF commented Mar 21, 2025

My example code from #30:

create table t(a int, x text);
-- 1. easiest to make work (still not super easy)
insert into t select r['a']::int, r['x']::text from duckdb.query($$ SELECT 1 a, 'abc' x $$) r;

Fails with this error:

ERROR:  42804: table row type and query-specified row type do not match
DETAIL:  Table has type text at ordinal position 2, but query expects character varying.
LOCATION:  ExecCheckPlanOutput, nodeModifyTable.c:212

I think that would be worked around by #583, but then the opposite would hold (i.e. inserting into a varchar column woudl fail). So I think we need some logic to make sure that the types match against eachother, and maybe add some code to convert the type that we get back from DuckDB to the type that PG wants for the INSERT. I think that would also fix the generate_series() issue that all of the tests are failing on currently.

@YuweiXiao YuweiXiao marked this pull request as draft March 22, 2025 09:29
@YuweiXiao
Copy link
Author

I think that would be worked around by #583, but then the opposite would hold (i.e. inserting into a varchar column woudl fail). So I think we need some logic to make sure that the types match against eachother, and maybe add some code to convert the type that we get back from DuckDB to the type that PG wants for the INSERT. I think that would also fix the generate_series() issue that all of the tests are failing on currently.

YES, the type check and conversion are substantial. I manually added an Expr to handle the coerce and it seems working so far (e.g., varchar and generate_series issue). I'll add more test cases to ensure if I haven't missed (or broken) anything, and code re-org (& cleanup) are also in progress. Thanks for the feedback, @JelteF !

@YuweiXiao YuweiXiao changed the title [DRAFT] Support insert_into_select for Postgres(heap) table [DRAFT] Support insert_into_select for Postgres table Mar 22, 2025
@YuweiXiao YuweiXiao force-pushed the heap_insert_select branch from dabffc2 to e7f7aaf Compare March 24, 2025 09:19
@YuweiXiao
Copy link
Author

YuweiXiao commented Mar 24, 2025

The MR is ready for review~ @JelteF Please take a look when you have a chance.

Currently, the regression is failing for hugeint_numeric. The issue seems related to the following behavior, which is reproducible in the main branch. Do you have any insights on this?

select * from duckdb.query($$ SELECT a FROM (VALUES (32942348563242.111222333444555666777888::numeric(38,24)), (NULL::numeric), (123456789.000000000000000000000001::numeric(38,24))) t(a) $$);

                   a
----------------------------------------
 32942348563242.11122233344455566677789  (correct one should be `32942348563242.111222333444555666777888`

      123456789.00000000000000000000000 (correct one should be 123456789.000000000000000000000001)
(3 rows)

update: DuckDB cli has the exact same output.

@YuweiXiao YuweiXiao marked this pull request as ready for review March 24, 2025 12:38
@YuweiXiao YuweiXiao changed the title [DRAFT] Support insert_into_select for Postgres table Support insert_into_select for Postgres table Mar 24, 2025
@YuweiXiao YuweiXiao requested a review from JelteF April 2, 2025 09:56
@@ -14,3 +14,4 @@ extern bool duckdb_explain_ctas;
PlannedStmt *DuckdbPlanNode(Query *parse, const char *query_string, int cursor_options, ParamListInfo bound_params,
bool throw_error);
duckdb::unique_ptr<duckdb::PreparedStatement> DuckdbPrepare(const Query *query, const char *explain_prefix = NULL);
bool IsAllowedPostgresInsert(Query *query, bool throw_error = false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you don't have https://editorconfig.org/ installed in your editor. Could you do that and re-save this file to add a trailing newline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

return false;
}

/* TODO: Checking supported target column types */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wanna implement this TODO still?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, the TODO comment can be removed. The follow-up code handles the checking through blacklist

@JelteF
Copy link
Collaborator

JelteF commented Apr 16, 2025

Currently, the regression is failing for hugeint_numeric. The issue seems related to the following behavior, which is reproducible in the main branch. Do you have any insights on this?

I think you can probably fix the test by changing the plain NULL to NULL::NUMERIC(38, 24). It seems at least that that solves the issue when running your reproducing example in the duckdb CLI.

continue;
}

if (attr->atttypid == BYTEAOID || attr->atttypid == XMLOID || pgduckdb::pg::IsDomainType(attr->atttypid) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding some unsupported types, I think we should instead determine the allowed types. Also, why are these types unsupported? Because we do support BYTEA as a type for regular duckdb execution, so that seems surprising that it doesn't work here.

INSERT INTO tbl select r['a']::int, r['b'] from duckdb.query($$ SELECT 21 a, 'ghi' b $$) r;
ERROR: no partition of relation "tbl" found for row
DETAIL: Partition key of the failing row contains (a) = (21).
SELECT * FROM tbl ORDER BY a LIMIT 5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to also check the contents of each partition, to double check that values go to the correct partition.

}

if (!select_rte) {
elog(elevel, "DuckDB does not support INSERT without a subquery");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for this error case afaict

IsAllowedPostgresInsert(Query *query, bool throw_error) {
int elevel = throw_error ? ERROR : DEBUG4;
if (query->commandType != CMD_INSERT) {
elog(elevel, "DuckDB only supports INSERT/SELECT on Postgres tables");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for this error case afaict


DROP TABLE tbl;
-- case: ON CONFLICT DO NOTHING
CREATE TABLE tbl (a int PRIMARY KEY, b text);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with a SERIAL column or a column with GENERATED ALWAYS AS IDENTITY.

@JelteF JelteF added this to the 1.1.0 milestone Apr 17, 2025
@JelteF
Copy link
Collaborator

JelteF commented Apr 17, 2025

To set expectations: I think this is a very cool feature and your implementation looks like the right direction. But it is relatively complex and impactful, so I would like to give it an appropriate round of testing and "baking" (aka testing by merging it and having it on the main branch for a while). Since we want to release 1.0 somewhat soon, I don't think it's realistic to still do that before the 1.0 release. So I added this to the new 1.1 milestone instead. I also won't spend time reviewing this PR in depth until then.

@YuweiXiao
Copy link
Author

To set expectations: I think this is a very cool feature and your implementation looks like the right direction. But it is relatively complex and impactful, so I would like to give it an appropriate round of testing and "baking" (aka testing by merging it and having it on the main branch for a while). Since we want to release 1.0 somewhat soon, I don't think it's realistic to still do that before the 1.0 release. So I added this to the new 1.1 milestone instead. I also won't spend time reviewing this PR in depth until then.

Totally agree, 1.1 makes sense. btw, any target date for 1.0 release?

@JelteF
Copy link
Collaborator

JelteF commented Apr 17, 2025

any target date for 1.0 release?

The current plan is to do it quickly after the DuckDB 1.3.0 release, so early May. But dates may obviously slip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Writing into Postgres tables
2 participants