[postgres-xl-general] Truncating Tables >1M records takes a long time

Matthew Tamayo-Rios matthew.t.rios at gmail.com
Mon Nov 27 19:36:25 PST 2017


I had to create a new table called entity_ids and that seems to be working
fine.

One thing we did change that seemed to reduce problems was removing calls
that change autocommit configuration for bulk inserts.

We had code that was calling .setAutoCommit( false ) -> batch inserts ->
commit .setAutoCommit(true).

The old table is still borked and locked, but I've been running for 5 days
without any issues since we made this change.

Regards,
Matthew
ᐧ

On Wed, Nov 22, 2017 at 8:16 PM, Matthew Tamayo-Rios <
matthew.t.rios at gmail.com> wrote:

> I also restarted and tried to rename the table-- unfortunately now I'm in
> this state:
>
> blocked_pid | blocked_user | blocking_pid | blocking_user |
>     blocked_statement                     |
> current_statement_in_blocking_process
> -------------+--------------+--------------+---------------+
> ----------------------------------------------------------+-
> ---------------------------------------------------------
>        15244 | X           |        15347 | Y   | alter table ids rename
> to ids2                          +| CREATE  INDEX  ON ids
> (entity_set_id,sync_id,entity_id)
>                   |              |                   |               | ;
>                                                       |
>        15347 | Y           |        15244 | X      | CREATE  INDEX  ON ids
> (entity_set_id,sync_id,entity_id)  | alter table ids rename to ids2
>                     +
>                   |              |                   |               |
>                                                       | ;
>
> Isn't postgres supposed to detect deadlocks like this?
>
> -mtr
>>
> On Wed, Nov 22, 2017 at 5:41 PM, Matthew Tamayo-Rios <
> matthew.t.rios at gmail.com> wrote:
>
>> It looks like you might be right-- there's a bunch of nodes waiting on
>> locks and other ones waiting on pgxc_node_receive ()
>>
>> blocked_pid | blocked_user | blocking_pid | blocking_user |
>>>       blocked_statement                     |
>>> current_statement_in_blocking_process
>>> -------------+--------------+--------------+---------------+
>>> ----------------------------------------------------------+-
>>> ---------------------------------------------------------
>>>        95377 |                    |        95434 |                     |
>>> INSERT INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>> VALUES ($1, $2, $3, $4)
>>>        95377 |                    |        95405 |                     |
>>> INSERT INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>> VALUES ($1, $2, $3, $4)
>>
>>        95377 |                    |        95378 |                     |
>>> INSERT INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>> VALUES ($1, $2, $3, $4)
>>
>>
>> From pid 95377:
>>
>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>> ../sysdeps/unix/syscall-template.S:84
>> #1  0x00000000006977aa in WaitLatchOrSocket ()
>> #2  0x00000000006ef51b in ProcSleep ()
>> #3  0x00000000006ea9a3 in WaitOnLock ()
>> #4  0x00000000006ebe67 in LockAcquireExtendedXC ()
>> #5  0x00000000006e95cb in LockRelationOid ()
>> #6  0x00000000007dea48 in ScanQueryForLocks ()
>> #7  0x00000000007deaf7 in AcquirePlannerLocks ()
>> #8  0x00000000007df675 in RevalidateCachedQuery ()
>> #9  0x00000000007dfae0 in GetCachedPlan ()
>> #10 0x00000000006ff326 in exec_bind_message ()
>> #11 0x0000000000700591 in PostgresMain ()
>> #12 0x000000000047b953 in ServerLoop ()
>> #13 0x00000000006a756a in PostmasterMain ()
>> #14 0x000000000047c9ec in main ()
>>
>> From pid 95434:
>>
>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>> ../sysdeps/unix/syscall-template.S:84
>> #1  0x00000000006977aa in WaitLatchOrSocket ()
>> #2  0x00000000006ef51b in ProcSleep ()
>> #3  0x00000000006ea9a3 in WaitOnLock ()
>> #4  0x00000000006ebe67 in LockAcquireExtendedXC ()
>> #5  0x00000000006e95cb in LockRelationOid ()
>> #6  0x00000000007dea48 in ScanQueryForLocks ()
>> #7  0x00000000007deaf7 in AcquirePlannerLocks ()
>> #8  0x00000000007df675 in RevalidateCachedQuery ()
>> #9  0x00000000007dfae0 in GetCachedPlan ()
>> #10 0x00000000006ff326 in exec_bind_message ()
>> #11 0x0000000000700591 in PostgresMain ()
>> #12 0x000000000047b953 in ServerLoop ()
>> #13 0x00000000006a756a in PostmasterMain ()
>> #14 0x000000000047c9ec in main ()
>>
>> From pid 95405:
>>
>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>> ../sysdeps/unix/syscall-template.S:84
>> #1  0x000000000060af2d in pgxc_node_receive ()
>> #2  0x000000000060f57f in FetchTuple ()
>> #3  0x0000000000615258 in ExecRemoteQuery ()
>> #4  0x00000000005cf028 in ExecProcNode ()
>> #5  0x00000000005cc147 in standard_ExecutorRun ()
>> #6  0x0000000000702449 in ProcessQuery ()
>> #7  0x000000000070276c in PortalRunMulti ()
>> #8  0x0000000000703ff0 in PortalRun ()
>> #9  0x0000000000700c7a in PostgresMain ()
>> #10 0x000000000047b953 in ServerLoop ()
>> #11 0x00000000006a756a in PostmasterMain ()
>> #12 0x000000000047c9ec in main ()
>>
>> From pid 95378:
>>
>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>> ../sysdeps/unix/syscall-template.S:84
>> #1  0x000000000060af2d in pgxc_node_receive ()
>> #2  0x000000000060f57f in FetchTuple ()
>> #3  0x0000000000615258 in ExecRemoteQuery ()
>> #4  0x00000000005cf028 in ExecProcNode ()
>> #5  0x00000000005cc147 in standard_ExecutorRun ()
>> #6  0x0000000000702449 in ProcessQuery ()
>> #7  0x000000000070276c in PortalRunMulti ()
>> #8  0x0000000000703ff0 in PortalRun ()
>> #9  0x0000000000700c7a in PostgresMain ()
>> #10 0x000000000047b953 in ServerLoop ()
>> #11 0x00000000006a756a in PostmasterMain ()
>> #12 0x000000000047c9ec in main ()
>>
>> From pid 95394:
>>
>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>> ../sysdeps/unix/syscall-template.S:84
>> #1  0x000000000060af2d in pgxc_node_receive ()
>> #2  0x000000000060f57f in FetchTuple ()
>> #3  0x0000000000615258 in ExecRemoteQuery ()
>> #4  0x00000000005cf028 in ExecProcNode ()
>> #5  0x00000000005cc147 in standard_ExecutorRun ()
>> #6  0x0000000000702449 in ProcessQuery ()
>> #7  0x000000000070276c in PortalRunMulti ()
>> #8  0x0000000000703ff0 in PortalRun ()
>> #9  0x0000000000700c7a in PostgresMain ()
>> #10 0x000000000047b953 in ServerLoop ()
>> #11 0x00000000006a756a in PostmasterMain ()
>> #12 0x000000000047c9ec in main ()
>>
>> From pid 95381:
>>
>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>> ../sysdeps/unix/syscall-template.S:84
>> #1  0x00000000006977aa in WaitLatchOrSocket ()
>> #2  0x00000000006ef51b in ProcSleep ()
>> #3  0x00000000006ea9a3 in WaitOnLock ()
>> #4  0x00000000006ebe67 in LockAcquireExtendedXC ()
>> #5  0x00000000006e95cb in LockRelationOid ()
>> #6  0x00000000007dea48 in ScanQueryForLocks ()
>> #7  0x00000000007deaf7 in AcquirePlannerLocks ()
>> #8  0x00000000007df675 in RevalidateCachedQuery ()
>> #9  0x00000000007dfae0 in GetCachedPlan ()
>> #10 0x00000000006ff326 in exec_bind_message ()
>> #11 0x0000000000700591 in PostgresMain ()
>> #12 0x000000000047b953 in ServerLoop ()
>> #13 0x00000000006a756a in PostmasterMain ()
>> #14 0x000000000047c9ec in main ()
>>
>> From pid 95382:
>>
>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>> ../sysdeps/unix/syscall-template.S:84
>> #1  0x00000000006977aa in WaitLatchOrSocket ()
>> #2  0x00000000006ef51b in ProcSleep ()
>> #3  0x00000000006ea9a3 in WaitOnLock ()
>> #4  0x00000000006ebe67 in LockAcquireExtendedXC ()
>> #5  0x00000000006e95cb in LockRelationOid ()
>> #6  0x00000000007dea48 in ScanQueryForLocks ()
>> #7  0x00000000007deaf7 in AcquirePlannerLocks ()
>> #8  0x00000000007df675 in RevalidateCachedQuery ()
>> #9  0x00000000007dfae0 in GetCachedPlan ()
>> #10 0x00000000006ff326 in exec_bind_message ()
>> #11 0x0000000000700591 in PostgresMain ()
>> #12 0x000000000047b953 in ServerLoop ()
>> #13 0x00000000006a756a in PostmasterMain ()
>> #14 0x000000000047c9ec in main ()
>>
>> From pid 95433:
>>
>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>> ../sysdeps/unix/syscall-template.S:84
>> #1  0x000000000060af2d in pgxc_node_receive ()
>> #2  0x000000000060f57f in FetchTuple ()
>> #3  0x0000000000615258 in ExecRemoteQuery ()
>> #4  0x00000000005cf028 in ExecProcNode ()
>> #5  0x00000000005cc147 in standard_ExecutorRun ()
>> #6  0x0000000000702449 in ProcessQuery ()
>> #7  0x000000000070276c in PortalRunMulti ()
>> #8  0x0000000000703ff0 in PortalRun ()
>> #9  0x0000000000700c7a in PostgresMain ()
>> #10 0x000000000047b953 in ServerLoop ()
>> #11 0x00000000006a756a in PostmasterMain ()
>> #12 0x000000000047c9ec in main ()
>>
>> Regards,
>> Matthew
>>
>>>>
>> On Wed, Nov 22, 2017 at 6:12 PM, Tomas Vondra <
>> tomas.vondra at 2ndquadrant.com> wrote:
>>
>>> Hi,
>>>
>>> On 11/22/2017 11:32 PM, Matthew Tamayo-Rios wrote:
>>> > I'm using PGXL 9.5r1.6, based on PG 9.5.8 (Postgres-XL 9.5r1.6)
>>> >
>>> > Also, it looks like something is broken with delete in general-- I
>>> tried
>>> > deleting a single row and it is hanging indefinitely. GTM seems to be
>>> > fine and all other operations (select, insert, update) seem to be
>>> > working fine. In fact, after doing some more troubleshooting (detailed
>>> > below) it seems that deleting even a single row from the very large
>>> > tables hangs indefinitely making the only way to clear them dropping
>>> the
>>> > entire database (which works).
>>> >
>>> > Delete hangs even on a single id for large tables.
>>> >
>>> >     delete from ids where id = '30095031-49e0-44eb-87a9-98b640773ef2';
>>> >
>>> >
>>> > Log from coordinator just has:
>>> >
>>> >     STATEMENT:  delete from ids where id =
>>> >     30095031-49e0-44eb-87a9-98b640773ef2;
>>> >
>>> >
>>> > Data node has no log entries;
>>> >
>>> > Successfully created a test table and deleted a record from it:
>>> >
>>> >     openlattice=> create table test( id uuid, c1 int, c2 int, PRIMARY
>>> >     KEY (id) ) distribute by hash(id);
>>> >     CREATE TABLE
>>> >     => insert into test VALUES( '30095031-49e0-44eb-87a9-98b64
>>> 0773ef2',
>>> >     21, 23 );
>>> >     INSERT 0 1
>>> >     => insert into test VALUES( '30095031-49e0-55eb-87a9-98b64
>>> 0773ef2',
>>> >     21, 23 );
>>> >     INSERT 0 1
>>> >     => delete from test where id = '30095031-49e0-44eb-87a9-98b64
>>> 0773ef2';
>>> >     DELETE 1
>>> >
>>> >
>>> > I also checked locks and I am seeing a bunch of prepared statements
>>> > blocking each other, but no inserts have been done in 24 hours.
>>> >
>>> > blocked_pid | blocked_user | blocking_pid | blocking_user |
>>>
>>> >       blocked_statement                     |
>>> > current_statement_in_blocking_process
>>> > -------------+--------------+--------------+---------------+
>>> ----------------------------------------------------------+-
>>> ---------------------------------------------------------
>>> >        95377 |                |       95434 |                   |
>>> INSERT
>>> > INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>> > VALUES ($1, $2, $3, $4)
>>> >        95377 |                |       95405 |                   |
>>> INSERT
>>> > INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>> > VALUES ($1, $2, $3, $4)
>>> >        95377 |                |       95378 |                   |
>>> INSERT
>>> > INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>> > VALUES ($1, $2, $3, $4)
>>> >
>>> > I omitted user above. So it looks like tables with heavy write usage
>>> are
>>> > getting blocked by prepared statements holding onto locks for too long.
>>> > Any ideas on what would cause that? These tables are being copied from
>>> > cassandra via a short lived ETL process in preparation for a migration
>>> > so their shouldn't be locks being held.
>>> >
>>>
>>> So you're doing the inserts using prepared statements? I wonder if that
>>> might be preventing restriction to subset of nodes (if we don't know the
>>> value, we can't compute the hash and restrict the list of nodes).
>>>
>>> Also, what do the backends holding the locks do? Are they waiting for
>>> some other lock (in that case, what does the backend holding that lock
>>> do), or are they stuck for some other reason?
>>>
>>> What does pg_stat_activity say?
>>>
>>> Can you attach gdb to the stuck backends and get the backtrace? That is,
>>> something like 'gdb -p $PID' and then 'bt'. I suspect it might be the
>>> GTM issue I reported a while back (see the stack at the beginning):
>>>
>>> http://lists.postgres-xl.org/pipermail/postgres-xl-developer
>>> s-postgres-xl.org/2017-November/000571.html
>>>
>>> Another idea is that perhaps there are some old prepared transaction.
>>> Can you check pg_prepared_xacts on the nodes?
>>>
>>> regards
>>>
>>> --
>>> Tomas Vondra                  http://www.2ndQuadrant.com
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-general-postgres-xl.org/attachments/20171127/adb9ce01/attachment.htm>


More information about the postgres-xl-general mailing list