[postgres-xl-general] Truncating Tables >1M records takes a long time

Krzysztof Nienartowicz krzysztof.nienartowicz at unige.ch
Tue Nov 28 01:48:05 PST 2017


Have you tried running pgxc_clean?


On Tue, Nov 28, 2017 at 4:36 AM, Matthew Tamayo-Rios <
matthew.t.rios at gmail.com> wrote:

> I had to create a new table called entity_ids and that seems to be working
> fine.
>
> One thing we did change that seemed to reduce problems was removing calls
> that change autocommit configuration for bulk inserts.
>
> We had code that was calling .setAutoCommit( false ) -> batch inserts ->
> commit .setAutoCommit(true).
>
> The old table is still borked and locked, but I've been running for 5 days
> without any issues since we made this change.
>
> Regards,
> Matthew
>>
> On Wed, Nov 22, 2017 at 8:16 PM, Matthew Tamayo-Rios <
> matthew.t.rios at gmail.com> wrote:
>
>> I also restarted and tried to rename the table-- unfortunately now I'm in
>> this state:
>>
>> blocked_pid | blocked_user | blocking_pid | blocking_user |
>>       blocked_statement                     |
>> current_statement_in_blocking_process
>> -------------+--------------+--------------+---------------+
>> ----------------------------------------------------------+-
>> ---------------------------------------------------------
>>        15244 | X           |        15347 | Y   | alter table ids rename
>> to ids2                          +| CREATE  INDEX  ON ids
>> (entity_set_id,sync_id,entity_id)
>>                   |              |                   |               | ;
>>                                                       |
>>        15347 | Y           |        15244 | X      | CREATE  INDEX  ON
>> ids (entity_set_id,sync_id,entity_id)  | alter table ids rename to ids2
>>                         +
>>                   |              |                   |               |
>>                                                       | ;
>>
>> Isn't postgres supposed to detect deadlocks like this?
>>
>> -mtr
>>>>
>> On Wed, Nov 22, 2017 at 5:41 PM, Matthew Tamayo-Rios <
>> matthew.t.rios at gmail.com> wrote:
>>
>>> It looks like you might be right-- there's a bunch of nodes waiting on
>>> locks and other ones waiting on pgxc_node_receive ()
>>>
>>> blocked_pid | blocked_user | blocking_pid | blocking_user |
>>>>       blocked_statement                     |
>>>> current_statement_in_blocking_process
>>>> -------------+--------------+--------------+---------------+
>>>> ----------------------------------------------------------+-
>>>> ---------------------------------------------------------
>>>>        95377 |                    |        95434 |
>>>>  | INSERT INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO
>>>> ids VALUES ($1, $2, $3, $4)
>>>>        95377 |                    |        95405 |
>>>>  | INSERT INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO
>>>> ids VALUES ($1, $2, $3, $4)
>>>
>>>        95377 |                    |        95378 |                     |
>>>> INSERT INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>>> VALUES ($1, $2, $3, $4)
>>>
>>>
>>> From pid 95377:
>>>
>>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:84
>>> #1  0x00000000006977aa in WaitLatchOrSocket ()
>>> #2  0x00000000006ef51b in ProcSleep ()
>>> #3  0x00000000006ea9a3 in WaitOnLock ()
>>> #4  0x00000000006ebe67 in LockAcquireExtendedXC ()
>>> #5  0x00000000006e95cb in LockRelationOid ()
>>> #6  0x00000000007dea48 in ScanQueryForLocks ()
>>> #7  0x00000000007deaf7 in AcquirePlannerLocks ()
>>> #8  0x00000000007df675 in RevalidateCachedQuery ()
>>> #9  0x00000000007dfae0 in GetCachedPlan ()
>>> #10 0x00000000006ff326 in exec_bind_message ()
>>> #11 0x0000000000700591 in PostgresMain ()
>>> #12 0x000000000047b953 in ServerLoop ()
>>> #13 0x00000000006a756a in PostmasterMain ()
>>> #14 0x000000000047c9ec in main ()
>>>
>>> From pid 95434:
>>>
>>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:84
>>> #1  0x00000000006977aa in WaitLatchOrSocket ()
>>> #2  0x00000000006ef51b in ProcSleep ()
>>> #3  0x00000000006ea9a3 in WaitOnLock ()
>>> #4  0x00000000006ebe67 in LockAcquireExtendedXC ()
>>> #5  0x00000000006e95cb in LockRelationOid ()
>>> #6  0x00000000007dea48 in ScanQueryForLocks ()
>>> #7  0x00000000007deaf7 in AcquirePlannerLocks ()
>>> #8  0x00000000007df675 in RevalidateCachedQuery ()
>>> #9  0x00000000007dfae0 in GetCachedPlan ()
>>> #10 0x00000000006ff326 in exec_bind_message ()
>>> #11 0x0000000000700591 in PostgresMain ()
>>> #12 0x000000000047b953 in ServerLoop ()
>>> #13 0x00000000006a756a in PostmasterMain ()
>>> #14 0x000000000047c9ec in main ()
>>>
>>> From pid 95405:
>>>
>>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:84
>>> #1  0x000000000060af2d in pgxc_node_receive ()
>>> #2  0x000000000060f57f in FetchTuple ()
>>> #3  0x0000000000615258 in ExecRemoteQuery ()
>>> #4  0x00000000005cf028 in ExecProcNode ()
>>> #5  0x00000000005cc147 in standard_ExecutorRun ()
>>> #6  0x0000000000702449 in ProcessQuery ()
>>> #7  0x000000000070276c in PortalRunMulti ()
>>> #8  0x0000000000703ff0 in PortalRun ()
>>> #9  0x0000000000700c7a in PostgresMain ()
>>> #10 0x000000000047b953 in ServerLoop ()
>>> #11 0x00000000006a756a in PostmasterMain ()
>>> #12 0x000000000047c9ec in main ()
>>>
>>> From pid 95378:
>>>
>>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:84
>>> #1  0x000000000060af2d in pgxc_node_receive ()
>>> #2  0x000000000060f57f in FetchTuple ()
>>> #3  0x0000000000615258 in ExecRemoteQuery ()
>>> #4  0x00000000005cf028 in ExecProcNode ()
>>> #5  0x00000000005cc147 in standard_ExecutorRun ()
>>> #6  0x0000000000702449 in ProcessQuery ()
>>> #7  0x000000000070276c in PortalRunMulti ()
>>> #8  0x0000000000703ff0 in PortalRun ()
>>> #9  0x0000000000700c7a in PostgresMain ()
>>> #10 0x000000000047b953 in ServerLoop ()
>>> #11 0x00000000006a756a in PostmasterMain ()
>>> #12 0x000000000047c9ec in main ()
>>>
>>> From pid 95394:
>>>
>>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:84
>>> #1  0x000000000060af2d in pgxc_node_receive ()
>>> #2  0x000000000060f57f in FetchTuple ()
>>> #3  0x0000000000615258 in ExecRemoteQuery ()
>>> #4  0x00000000005cf028 in ExecProcNode ()
>>> #5  0x00000000005cc147 in standard_ExecutorRun ()
>>> #6  0x0000000000702449 in ProcessQuery ()
>>> #7  0x000000000070276c in PortalRunMulti ()
>>> #8  0x0000000000703ff0 in PortalRun ()
>>> #9  0x0000000000700c7a in PostgresMain ()
>>> #10 0x000000000047b953 in ServerLoop ()
>>> #11 0x00000000006a756a in PostmasterMain ()
>>> #12 0x000000000047c9ec in main ()
>>>
>>> From pid 95381:
>>>
>>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:84
>>> #1  0x00000000006977aa in WaitLatchOrSocket ()
>>> #2  0x00000000006ef51b in ProcSleep ()
>>> #3  0x00000000006ea9a3 in WaitOnLock ()
>>> #4  0x00000000006ebe67 in LockAcquireExtendedXC ()
>>> #5  0x00000000006e95cb in LockRelationOid ()
>>> #6  0x00000000007dea48 in ScanQueryForLocks ()
>>> #7  0x00000000007deaf7 in AcquirePlannerLocks ()
>>> #8  0x00000000007df675 in RevalidateCachedQuery ()
>>> #9  0x00000000007dfae0 in GetCachedPlan ()
>>> #10 0x00000000006ff326 in exec_bind_message ()
>>> #11 0x0000000000700591 in PostgresMain ()
>>> #12 0x000000000047b953 in ServerLoop ()
>>> #13 0x00000000006a756a in PostmasterMain ()
>>> #14 0x000000000047c9ec in main ()
>>>
>>> From pid 95382:
>>>
>>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:84
>>> #1  0x00000000006977aa in WaitLatchOrSocket ()
>>> #2  0x00000000006ef51b in ProcSleep ()
>>> #3  0x00000000006ea9a3 in WaitOnLock ()
>>> #4  0x00000000006ebe67 in LockAcquireExtendedXC ()
>>> #5  0x00000000006e95cb in LockRelationOid ()
>>> #6  0x00000000007dea48 in ScanQueryForLocks ()
>>> #7  0x00000000007deaf7 in AcquirePlannerLocks ()
>>> #8  0x00000000007df675 in RevalidateCachedQuery ()
>>> #9  0x00000000007dfae0 in GetCachedPlan ()
>>> #10 0x00000000006ff326 in exec_bind_message ()
>>> #11 0x0000000000700591 in PostgresMain ()
>>> #12 0x000000000047b953 in ServerLoop ()
>>> #13 0x00000000006a756a in PostmasterMain ()
>>> #14 0x000000000047c9ec in main ()
>>>
>>> From pid 95433:
>>>
>>> #0  0x00007f15384e16f0 in __poll_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:84
>>> #1  0x000000000060af2d in pgxc_node_receive ()
>>> #2  0x000000000060f57f in FetchTuple ()
>>> #3  0x0000000000615258 in ExecRemoteQuery ()
>>> #4  0x00000000005cf028 in ExecProcNode ()
>>> #5  0x00000000005cc147 in standard_ExecutorRun ()
>>> #6  0x0000000000702449 in ProcessQuery ()
>>> #7  0x000000000070276c in PortalRunMulti ()
>>> #8  0x0000000000703ff0 in PortalRun ()
>>> #9  0x0000000000700c7a in PostgresMain ()
>>> #10 0x000000000047b953 in ServerLoop ()
>>> #11 0x00000000006a756a in PostmasterMain ()
>>> #12 0x000000000047c9ec in main ()
>>>
>>> Regards,
>>> Matthew
>>>
>>>>>>
>>> On Wed, Nov 22, 2017 at 6:12 PM, Tomas Vondra <
>>> tomas.vondra at 2ndquadrant.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 11/22/2017 11:32 PM, Matthew Tamayo-Rios wrote:
>>>> > I'm using PGXL 9.5r1.6, based on PG 9.5.8 (Postgres-XL 9.5r1.6)
>>>> >
>>>> > Also, it looks like something is broken with delete in general-- I
>>>> tried
>>>> > deleting a single row and it is hanging indefinitely. GTM seems to be
>>>> > fine and all other operations (select, insert, update) seem to be
>>>> > working fine. In fact, after doing some more troubleshooting (detailed
>>>> > below) it seems that deleting even a single row from the very large
>>>> > tables hangs indefinitely making the only way to clear them dropping
>>>> the
>>>> > entire database (which works).
>>>> >
>>>> > Delete hangs even on a single id for large tables.
>>>> >
>>>> >     delete from ids where id = '30095031-49e0-44eb-87a9-98b64
>>>> 0773ef2';
>>>> >
>>>> >
>>>> > Log from coordinator just has:
>>>> >
>>>> >     STATEMENT:  delete from ids where id =
>>>> >     30095031-49e0-44eb-87a9-98b640773ef2;
>>>> >
>>>> >
>>>> > Data node has no log entries;
>>>> >
>>>> > Successfully created a test table and deleted a record from it:
>>>> >
>>>> >     openlattice=> create table test( id uuid, c1 int, c2 int, PRIMARY
>>>> >     KEY (id) ) distribute by hash(id);
>>>> >     CREATE TABLE
>>>> >     => insert into test VALUES( '30095031-49e0-44eb-87a9-98b64
>>>> 0773ef2',
>>>> >     21, 23 );
>>>> >     INSERT 0 1
>>>> >     => insert into test VALUES( '30095031-49e0-55eb-87a9-98b64
>>>> 0773ef2',
>>>> >     21, 23 );
>>>> >     INSERT 0 1
>>>> >     => delete from test where id = '30095031-49e0-44eb-87a9-98b64
>>>> 0773ef2';
>>>> >     DELETE 1
>>>> >
>>>> >
>>>> > I also checked locks and I am seeing a bunch of prepared statements
>>>> > blocking each other, but no inserts have been done in 24 hours.
>>>> >
>>>> > blocked_pid | blocked_user | blocking_pid | blocking_user |
>>>>
>>>> >       blocked_statement                     |
>>>> > current_statement_in_blocking_process
>>>> > -------------+--------------+--------------+---------------+
>>>> ----------------------------------------------------------+-
>>>> ---------------------------------------------------------
>>>> >        95377 |                |       95434 |                   |
>>>> INSERT
>>>> > INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>>> > VALUES ($1, $2, $3, $4)
>>>> >        95377 |                |       95405 |                   |
>>>> INSERT
>>>> > INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>>> > VALUES ($1, $2, $3, $4)
>>>> >        95377 |                |       95378 |                   |
>>>> INSERT
>>>> > INTO ids VALUES ($1, $2, $3, $4)                  | INSERT INTO ids
>>>> > VALUES ($1, $2, $3, $4)
>>>> >
>>>> > I omitted user above. So it looks like tables with heavy write usage
>>>> are
>>>> > getting blocked by prepared statements holding onto locks for too
>>>> long.
>>>> > Any ideas on what would cause that? These tables are being copied from
>>>> > cassandra via a short lived ETL process in preparation for a migration
>>>> > so their shouldn't be locks being held.
>>>> >
>>>>
>>>> So you're doing the inserts using prepared statements? I wonder if that
>>>> might be preventing restriction to subset of nodes (if we don't know the
>>>> value, we can't compute the hash and restrict the list of nodes).
>>>>
>>>> Also, what do the backends holding the locks do? Are they waiting for
>>>> some other lock (in that case, what does the backend holding that lock
>>>> do), or are they stuck for some other reason?
>>>>
>>>> What does pg_stat_activity say?
>>>>
>>>> Can you attach gdb to the stuck backends and get the backtrace? That is,
>>>> something like 'gdb -p $PID' and then 'bt'. I suspect it might be the
>>>> GTM issue I reported a while back (see the stack at the beginning):
>>>>
>>>> http://lists.postgres-xl.org/pipermail/postgres-xl-developer
>>>> s-postgres-xl.org/2017-November/000571.html
>>>>
>>>> Another idea is that perhaps there are some old prepared transaction.
>>>> Can you check pg_prepared_xacts on the nodes?
>>>>
>>>> regards
>>>>
>>>> --
>>>> Tomas Vondra                  http://www.2ndQuadrant.com
>>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>>
>>>
>>>
>>
>
> _______________________________________________
> postgres-xl-general mailing list
> postgres-xl-general at lists.postgres-xl.org
> http://lists.postgres-xl.org/listinfo.cgi/postgres-xl-
> general-postgres-xl.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-general-postgres-xl.org/attachments/20171128/43688126/attachment.htm>


More information about the postgres-xl-general mailing list