Describe the bug
dev=> create table t1 (v1 int, v2 int);
CREATE_TABLE
dev=> insert into t1 values (1, 2);
INSERT 0 1
dev=> select * from t1;
v1 | v2
----+----
1 | 2
(1 row)
dev=> insert into t1 (v1, v2) values (3, 4);
INSERT 0 1
dev=> insert into t1 (v2, v1) values (5, 6);
INSERT 0 1
dev=> select * from t1;
v1 | v2
----+----
3 | 4
5 | 6
1 | 2
(3 rows)
Specified columns is not handled in bind_insert
.
I would like to have a look at it if that is fine by you. Please let me know if you are already working on this @xiangjinwu. AFAIK this is not a time-critical issue. It does not have a milestone attached to it, right?
Just to reiterate. When running the following statements
create table t (v1 int, v2 int);
insert into t values (1, 2);
insert into t (v1, v2) values (3, 4);
insert into t (v2, v1) values (5, 6);
select * from t;
We expect
v1 | v2
----+----
1 | 2
3 | 4
6 | 5
but instead receive
v1 | v2
----+----
5 | 6 <- incorrect row
3 | 4
1 | 2
This is because we are currently ignoring the columns. This is also why we can run incorrect statements, like
insert into t (x, y) values (0, 0);
# Postgres error: column "x" of relation "t" does not exist
insert into t (v1, v1) values (0, 0);
# Postgres error: column "v1" specified more than once
TODOs:
Current PR is still work in progress, because I am working on a cloud related issue at the moment.
I talked about this issue with @nanderstabel. We would propose to split up this issue in two parts:
create table t1 (v1 int, v2 int);
insert into t1 (v1, v2) values (5, 6);
insert into t1 (v2, v1) values (5, 6);
insert into t1 (v1) values (5, 6); // error
insert into t1 (v1234, v2) values (5, 6); // error unknown column
...
NULL
values if the user did not define enough valuescreate table t1 (v1 int, v2 int);
insert into t1 (v1) values (5); // v2 should implicitly be NULL
IMHO 1.) and 2.) are two distinct problems and supporting them in one PR is quite a headache (at least for me). Would this approach be fine by you guys?
Maybe we should do some more discussions regarding 2.) and what the expected behaviour is
create table t1 (v1 int, v2 int);
insert into t1 (v1) values (5); // v2 should be NULL
insert into t1 values (5); // v2 should be NULL? Do we allow this even though the user did not explicitly define what is null?
If above approach is ok with you I would require in the current version that the user always needs to provide a value for each column in the table. Is that ok?
supporting them in one PR is quite a headache
In most cases it is actually encouraged to break problems into smaller ones. No need to worry about not solving it in one go.
what the expected behaviour is
Follow PostgreSQL's behavior or raise a reason not to do so (e.g. increased complexity with rare/deprecated usage). For this specific case, PostgreSQL does allow it and fills v2
with null
(ref):
The second form is a PostgreSQL extension. It fills the columns from the left with as many values as are given, and the rest will be defaulted.
Perfect. Thank you for the clarification. Opening https://github.com/risingwavelabs/risingwave/issues/6128. I will probably not work on it in the context of this bug.
Also opening related bug https://github.com/risingwavelabs/risingwave/issues/6156
First version of the PR https://github.com/risingwavelabs/risingwave/pull/5697 is ready for review. I still have a few minor questions which I point out in the PR. Please let me know what you think.