[Postgres-xl-general] COPY to partitioned table problem.

Krzysztof Nienartowicz krzysztof.nienartowicz at unige.ch
Mon Aug 17 09:40:28 PDT 2015


FYI, this bug also exists in the head based on the 9.5.
workaround seems to work (there were few errors for billions of rows
ingested in parallel).


On Thu, Jul 9, 2015 at 1:09 PM, Krzysztof Nienartowicz <
krzysztof.nienartowicz at unige.ch> wrote:

> I found the solution:
> I realized partitioning [insert] triggers increment command id with each
> trigger invocation while the transaction is in the COPY block. This seems
> to confuse coordinator to datanode communication as command id messages got
> interleaved with COPY messages and whacks the COPY state machine.
> In XL there's additional call to propagate new command id that does not
> exist in X2, that snapmgr.c:*UpdateActiveSnapshotCommandId *switches on
> sending back new command ids to other nodes.
>
> Switching this off fixes the problem.
>
> Question: Is this command ID propagation really needed except for
> serializable transactions? Any other logic is affected? Is there a cleaner
> solution?
>
> Cheers,
> Krzysztof
>
> /*
>
>  * UpdateActiveSnapshotCommandId
>
>  *
>
>  * Update the current CID of the active snapshot.  This can only be applied
>
>  * to a snapshot that is not referenced elsewhere.
>
>  */
>
> *void*
>
> *UpdateActiveSnapshotCommandId*(*void*)
>
> {
>
> Assert(ActiveSnapshot != NULL);
>
> Assert(ActiveSnapshot->as_snap->active_count == 1);
>
> Assert(ActiveSnapshot->as_snap->regd_count == 0);
>
>
> ActiveSnapshot->as_snap->curcid = GetCurrentCommandId(false);
>
> *#ifdef* XCP
>
> /*
>
> * Set flag so that updated command ID is sent to the datanodes before the
>
> * next query. This ensures that the effects of previous statements are
>
> * visible to the subsequent statements
>
> */
>
> SetSendCommandId(true); //<--------------------- should be false to fix
> the COPY protocol bug.
>
> *#endif*
>
> }
>
> On Thu, Jul 2, 2015 at 4:12 PM, Krzysztof Nienartowicz <
> krzysztof.nienartowicz at unige.ch> wrote:
>
>> Hello,
>> While trying to ingest into the partitioned table via trigger [we patched
>> XL head source to be able to have insert triggers] we got error after
>> [sometimes varying] number of lines ingested, i.e.:
>> <sql>
>> -- simplified example with 3 datanodes, 3 coords, same coord used
>>
>> create or replace function x_t() returns trigger
>> as
>> $BODY$
>>
>> declare
>> begin
>>      IF ( NEW.id between  0  and 10) THEN
>>          INSERT INTO x1 VALUES (NEW.*);
>>      elsif(NEW.id between 11 and 100) then
>>          INSERT INTO x2 VALUES (NEW.*);
>>      else
>>          INSERT INTO x3 VALUES (NEW.*);
>>      end if;
>>      return null;
>> end
>> $BODY$ language plpgsql;
>>
>>
>> create table x (id int primary key, ii text) distribute by hash(id);
>> create table x1(check (id between 0 and 10)) inherits (x);
>> create table x2(check (id between 11 and 100)) inherits (x);
>> create table x3(check (id > 100)) inherits (x);
>>
>> CREATE TRIGGER x_t BEFORE INSERT ON x FOR EACH ROW EXECUTE PROCEDURE
>> x_t();
>>
>> -- populate tables via trigger with insert
>>
>> insert into x select i,rpad('X',300,'y') from generate_series(1,1000) i;
>> explain analyze select count(*) from x;
>> -- all good:
>> --
>> QUERY
>> PLAN
>>
>> --
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>> --  Aggregate  (cost=559.68..559.69 rows=1 width=0) (actual
>> time=8.808..8.809 rows=1 loops=1)
>> --   ->  Append  (cost=100.00..541.15 rows=7411 width=0) (actual
>> time=0.677..7.015 rows=1000 loops=1)
>> --         ->  Remote Subquery Scan on all
>> (datanode1,datanode2,datanode3)  (cost=100.00..100.00 rows=1 width=0)
>> (actual time=0.486..0.486 rows=0 loops=1)
>> --         ->  Remote Subquery Scan on all
>> (datanode1,datanode2,datanode3)  (cost=100.00..147.05 rows=2470 width=0)
>> (actual time=0.185..0.210 rows=10 loops=1)
>> --         ->  Remote Subquery Scan on all
>> (datanode1,datanode2,datanode3)  (cost=100.00..147.05 rows=2470 width=0)
>> (actual time=0.246..0.429 rows=90 loops=1)
>> --         ->  Remote Subquery Scan on all
>> (datanode1,datanode2,datanode3)  (cost=100.00..147.05 rows=2470 width=0)
>> (actual time=0.269..2.507 rows=900 loops=1)
>>
>>
>> </sql>
>> <bash>
>> # do dump
>> psql -p 20004 -h coord1 -c "COPY (select * from x) TO STDOUT DELIMITER AS
>> ';'" | tee  /tmp/x.dat | wc -l
>>
>> # all good, 1000 rows saved.
>> # now, truncate and try to reingest:
>> </bash>
>> <sql>
>> truncate table x;
>> </sql>
>>
>> <bash>
>> awk 'NR>0 && NR <= 1000 {print $0;}' /tmp/x.dat | psql  -h 20004 -c "set
>> client_min_messages=DEBUG5; COPY dpac_or5_s2_epsl_dex.x FROM STDIN
>> DELIMITER AS ';'"
>> DEBUG:  ProcessUtility
>> DEBUG:  global snapshot info: gxmin: 63682, gxmax: 63682, gxcnt: 0
>> DEBUG:  conn_count = 3, need_tran_block = true
>> LOG:  Connection error invalid string in message
>> LOG:  Connection error invalid string in message
>> LOG:  Connection error invalid string in message
>> ERROR:  Copy failed on a data node
>> CONTEXT:  COPY x, line 142:
>> "251;Xyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy..."
>> DEBUG:  Record transaction abort 63682
>>
>> # repeat the cmd, fails on different line number
>> ERROR:  Copy failed on a data node
>> CONTEXT:  COPY x, line 241:
>> "655;Xyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy..."
>>
>> #repeat the cmd, fails on the same line number
>> ERROR:  Copy failed on a data node
>> CONTEXT:  COPY x, line 142:
>> "251;Xyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy..."
>>
>> #repeat the cmd, fails on the same line number
>> ERROR:  Copy failed on a data node
>> CONTEXT:  COPY x, line 142:
>> "251;Xyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy..."
>>
>> # but when to limit number of rows it would work:
>>
>> awk 'NR>0 && NR <= 100 {print $0;}' /tmp/x.dat | psql -p20004  -c "set
>> client_min_messages=DEBUG5; COPY dpac_or5_s2_epsl_dex.x FROM STDIN
>> DELIMITER AS ';'"
>> DEBUG:  ProcessUtility
>> DEBUG:  global snapshot info: gxmin: 63723, gxmax: 63723, gxcnt: 0
>> DEBUG:  conn_count = 3, need_tran_block = true
>> DEBUG:  CommitTransactionCommand
>> DEBUG:  CommitTransaction
>> DEBUG:  name: unnamed; blockState:       STARTED; state: INPROGR,
>> xid/subid/cid: 63723/1/1 (used), nestlvl: 1, children:
>> DEBUG:  Record transaction commit 63723
>> DEBUG:  [re]setting xid = 0, old_value = 0
>>
>> # select gives back proper number of rows properly distributed by the
>> trigger
>> </bash>
>>
>> Clearly the messages passing rows from coordinator to datanodes are
>> truncated/corrupted somehow (copy.c:622). Of course we could try to make a
>> workaround to copy into the partitions directly, but would really prefer to
>> avoid this.
>>
>> Any hints are welcome how to fix this!
>>
>> Best regards,
>> Krzysztof
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-general-postgres-xl.org/attachments/20150817/f1335f69/attachment.htm>


More information about the postgres-xl-general mailing list