[Postgres-xl-developers] [pgx2-steering] Re: X2/XL consistency violation

Koichi Suzuki koichi.dbms at gmail.com
Mon Aug 3 00:59:44 PDT 2015


Further report on this:

I ran the same test without fixing OID duplicate in src/include/catalog and
did not observe any in conflicts.  I ran about three million transactions
(read and write) in about 15 minutes.

--->8-------------------------->8---------------------

[koichi at everest:inconsisitent_update_test]$ go run runtest.go

1 : 0 -> 505812819 xid = 2135 count = 10000

^Cexit status 2

[koichi at everest:inconsisitent_update_test]$ date

2015年  8月  3日 月曜日 15:35:15 CST

[koichi at everest:inconsisitent_update_test]$ go run runtest.go

1 : 0 -> 505812819 xid = 629401 count = 10000

^Cexit status 2

[koichi at everest:inconsisitent_update_test]$ go run runtest.go

1 : 0 -> 505812819 xid = 3079041 count = 10000

^Cexit status 2

[koichi at everest:inconsisitent_update_test]$
----8<------------------------------------8<----------------

Compared with Konstantin's result where he observed a conflict in about
five thousand (write) transactions, I think the cause was not OID duplicate.

XC made so many fixes for stability this three months.   Some of these
fixes might have worked.

Configuration is as follows:

--->8-------------------------->8---------------------

PGXC show config all

========= Postgres-XC configuration Common Info ========================

=== Overall ===

Postgres-XC owner: koichi

Postgres-XC user: koichi

Postgres-XC install directory: /home/koichi/pgxc

pgxc_ctl home: /home/koichi/pgxc_ctl

pgxc_ctl configuration file: /home/koichi/pgxc_ctl/pgxc_ctl_local.conf

pgxc_ctl tmpDir: /tmp

pgxc_ctl localTempDir: /tmp

pgxc_ctl log file: /home/koichi/pgxc_ctl/pgxc_log/11298_pgxc_ctl.log

pgxc_ctl configBackup: n

pgxc_ctl configBackupHost: none

pgxc_ctl configBackupFile: none

========= Postgres-XC configuration End Common Info ===================

====== Server: localhost =======

GTM Master:

    Nodename: 'gtm', port: 20001, dir: '/home/koichi/pgxc/localnodes/gtm'
  ExtraConfig: 'none', Specific Extra Config: 'none'

Coordinator Master:

    Nodename: 'coord1', port: 20004, pooler port: 20010

    MaxWalSenders: 5, Dir: '/home/koichi/pgxc/localnodes/coord1'

    ExtraConfig: 'coordExtraConfig', Specific Extra Config: 'none'

    pg_hba entries ( '172.16.30.0/24' )

    Extra pg_hba: '172.16.30.0/24', Specific Extra pg_hba: 'none'

Coordinator Master:

    Nodename: 'coord2', port: 20005, pooler port: 20011

    MaxWalSenders: 5, Dir: '/home/koichi/pgxc/localnodes/coord2'

    ExtraConfig: 'coordExtraConfig', Specific Extra Config: 'none'

    pg_hba entries ( '172.16.30.0/24' )

    Extra pg_hba: '172.16.30.0/24', Specific Extra pg_hba: 'none'

Datanode Master:

    Nodename: 'datanode1', port: 20008

    MaxWALSenders: 5, Dir: '/home/koichi/pgxc/localnodes/dn1'

    ExtraConfig: 'none', Specific Extra Config: 'none'

    pg_hba entries ( '172.16.30.0/24' )

    Extra pg_hba: 'none', Specific Extra pg_hba: 'none'

Datanode Master:

    Nodename: 'datanode2', port: 20009

    MaxWALSenders: 5, Dir: '/home/koichi/pgxc/localnodes/dn2'

    ExtraConfig: 'none', Specific Extra Config: 'none'

    pg_hba entries ( '172.16.30.0/24' )

    Extra pg_hba: 'none', Specific Extra pg_hba: 'none'

PGXC
----8<------------------------------------8<----------------

Any input is welcome.

Best;

---
Koichi Suzuki
https://www.linkedin.com/in/koichidbms


2015-08-03 15:53 GMT+09:00 Koichi Suzuki <koichi.dbms at gmail.com>:

> Hi,
>
> I raun Konstaintin's test with XC (REL1_2_STABLE, without any dedicated
> fix) form more than an hour and didn't have any update conflict.   The
> following is the result.
>
> ---8<--------------------------------8<------------------------
> [koichi at everest:inconsisitent_update_test]$ date
>
> 2015年  8月  3日 月曜日 13:40:50 CST
>
> [koichi at everest:inconsisitent_update_test]$ go run runtest.go
>
> 1 : 0 -> 496880297 xid = 950188 count = 10000
>
> ^Cexit status 2
>
> [koichi at everest:inconsisitent_update_test]$ date
>
> 2015年  8月  3日 月曜日 14:41:32 CST
>
> [koichi at everest:inconsisitent_update_test]$
> --->8----------------------------------->8---------------------
>
> I used the following parameters:
>
>     TRANSFER_CONNECTIONS = 10
>
>     BALANCE_CONNECTIONS  = 10
>
> I was wondering if previous analysis is correct.   Pavan's analysis
> suggests that TransactionIdDidCommit() may return true although the
> transaction commit has not been reached to GTM and GXID is still in the
> snapshot so that it is visible to other read transactions.
>
> When I traced the code, I found that the target block has already been
> locked with EXCLUSIVE mode so that other read transaction cannot read it
> until the update has been done.
>
> Then what is the candidate of the cause?   I noticed the conflict in
> embedded function OIDs in src/include/catalog.   This conflict exists as
> early as XC 1.1.   I found Pavan fixed this conflict in 9.5 merge, but not
> sure if it has been fixed in the version Konstantin used.
>
> XC's REL1_2_STABLE has this fix.   This conflict may end up with calling
> wrong functions internally which can violate the lock and update wrong heap
> tuple.
>
> I think XL should review and test this.
>
> Regards;
>
> ---
> Koichi Suzuki
> https://www.linkedin.com/in/koichidbms
>
>
> 2015-07-21 13:40 GMT+09:00 Pavan Deolasee <pavan.deolasee at gmail.com>:
>
>> I have now checked in both these patches. I'm extremely sorry, but I
>> forgot to add appropriate credits in the final commit messages. I'd changed
>> the patches by hand to add those, but forgot to make changes in my local
>> repo from where I pushed the changes. Will be careful in future.
>>
>> Thanks,
>> Pavan
>>
>> On Mon, Jul 20, 2015 at 7:10 PM, Pavan Deolasee <pavan.deolasee at gmail.com
>> > wrote:
>>
>>> Here are couple of patches to fix the MVCC issues reported by
>>> Konstantin. The first patch just partially restores the old behaviour of
>>> obtaining an XID before getting a snapshot. I don't think this is a long
>>> term fix and I would like to work on a patch later which allows us to
>>> obtain a snapshot without a global XID and somehow track that to arrive at
>>> the correct RecentGlobalXmin calculation at the GTM. For now, it fixes the
>>> issue reported by Konstantin.
>>>
>>> The second patch adds the ability to enforce logical ordering of
>>> transaction commits at the GTM, when it matters. So if a transaction waits
>>> for some other transaction to finish at a node, then GTM enforces the same
>>> ordering by delaying commit of a transaction until it sees the other
>>> transaction still in-progress.
>>>
>>> Konstantin's test case passes for me with both patches applied. I tested
>>> with pgbench and no note-worthy drop in performance with both patches
>>> applied. But if someone could run tests on a better hardware, that will
>>> surely help.
>>>
>>> Barring objections, I'll check-in these fixes to XL.
>>>
>>> Thanks,
>>> Pavan
>>>
>>> On Thu, Jul 16, 2015 at 12:23 PM, Pavan Deolasee <
>>> pavan.deolasee at gmail.com> wrote:
>>>
>>>> Hello Suzuki-san,
>>>>
>>>> Can you please do some tests to verify the performance implications of
>>>> the patch? I'm still curious about the additional synchronization points
>>>> added at several place. For the record, I'm working on a patch based on the
>>>> transaction commit ordering idea I explained above and would run tests once
>>>> its ready and report.
>>>>
>>>> Thanks,
>>>> Pavan
>>>>
>>>>
>>>> On Thu, Jul 16, 2015 at 12:06 PM, Koichi Suzuki <koichi.dbms at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello;
>>>>>
>>>>> Here's a candidate of the fix for inconsistencies as Konstantin
>>>>> reported.    Please note the following:
>>>>>
>>>>> 1.  This patch looks to work nicely with DML and query.   But reveals
>>>>> potential deadlock of DDL, I mean you will see commit synchronization
>>>>> failure only when DDLs are issued in parallel.   For the complete fix, we
>>>>> need DDL serialization, which will fix the issue reported as schema
>>>>> inconsistency.   So please do not blame that this patch makes regression
>>>>> fail.   I'm also working with this DDL fix (it's separate cause) and would
>>>>> like to share this.
>>>>>
>>>>> 2. The patch is tested with XC.   I believe this applies to XL as well
>>>>> without significant modification.
>>>>>
>>>>> 3. This patch obtains fresh snapshot from GTM using independent GXID.
>>>>>   This is okay for datanodes or child transaction in coordinators.
>>>>> Although it looks working so far, I'm planning to improve GTM so that GTM
>>>>> ignore such temporary GXID, or accept InvalidGlobalTransactionId and then
>>>>> give just a fresh snapshot and ignore everything else.  This patch will be
>>>>> posted as early as possible.
>>>>>
>>>>> 4. GUC name is not good so far.   I will change this.
>>>>>
>>>>> 5. Invisible tuple fix was not fixed.   It is related to XL-specific
>>>>> recentXmin, which I believe Pavan is working with.
>>>>>
>>>>> As long as DDL is issued in series, this patch should work.    It will
>>>>> be nice if Konstantin tries this.
>>>>>
>>>>> Best;
>>>>>
>>>>> ---
>>>>> Koichi Suzuki
>>>>> https://www.linkedin.com/in/koichidbms
>>>>>
>>>>>
>>>>> 2015-07-14 17:29 GMT+09:00 Koichi Suzuki <koichi.dbms at gmail.com>:
>>>>>
>>>>>> Attached is very rough preliminary design of the code.   Please allow
>>>>>> typo, incorrect names and so on, unless they are significant.  This is just
>>>>>> to present a design idea and does not compile successfully.
>>>>>>
>>>>>> Also, please note that we need more careful review where GXID
>>>>>> synchronization is needed in HeapTupleSatisfiesUpdate(), as indicated by
>>>>>> /* Sync here */
>>>>>>
>>>>>> Regards;
>>>>>>
>>>>>> ---
>>>>>> Koichi Suzuki
>>>>>> https://www.linkedin.com/in/koichidbms
>>>>>>
>>>>>>
>>>>>> 2015-07-14 14:39 GMT+09:00 Koichi Suzuki <koichi.dbms at gmail.com>:
>>>>>>
>>>>>>> Now I'm looking around the code where X2 wait until X1 reports its
>>>>>>> commit to GTM.
>>>>>>>
>>>>>>> HeapTupleSatisfiesUpdate() in tqual.c looks good.   We need to wait
>>>>>>> until X1 is not included in the fresh snapshot before we return
>>>>>>> HeapTupleMayBeUpdated and HeapTupleUpadted.   I need some more time to get
>>>>>>> more practical idea of the code.
>>>>>>>
>>>>>>> Regards;
>>>>>>>
>>>>>>> ---
>>>>>>> Koichi Suzuki
>>>>>>> https://www.linkedin.com/in/koichidbms
>>>>>>>
>>>>>>>
>>>>>>> 2015-07-13 14:37 GMT+09:00 Koichi Suzuki <koichi.dbms at gmail.com>:
>>>>>>>
>>>>>>>> 2015-07-13 14:16 GMT+09:00 Pavan Deolasee <pavan.deolasee at gmail.com
>>>>>>>> >:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jul 13, 2015 at 7:17 AM, Koichi Suzuki <
>>>>>>>>> koichi.dbms at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Considering very short duration until original transaction (X1)
>>>>>>>>>> reports GTM that it is committed,  if X2 finds that X1 is still in the
>>>>>>>>>> snapshot, X2 can ask GTM for new snapshot until X1 is not included in it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> X2 can't really ask for a new snapshot. Its perfectly fine to work
>>>>>>>>> with a snapshot that has a transaction included, but already committed. Or
>>>>>>>>> are you suggesting that X2 waits for X1 to commit at the GTM instead of the
>>>>>>>>> local node before proceeding with an UPDATE? i.e. we modify
>>>>>>>>> XactLockTableWait() so that it waits for GTM events? I'm afraid that won't
>>>>>>>>> be cheap either.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is the basic idea.   Why X2 cannot ask for a fresh
>>>>>>>> snapshot?  GTM should accept such requirement and if we regard this
>>>>>>>> snapshot as "temporary" one, just to check if X1 is reported committed to
>>>>>>>> GTM,  and we still continue to use the old one for visibility check, I
>>>>>>>> suppose this is feasible.    Anyway, if X2 is updating a record updated by
>>>>>>>> X1 and if X1 is found in the snapshot, X2 should wait until X1 is reported
>>>>>>>> as committed to GTM.    I'm not sure if XactLockTableWait() is only one
>>>>>>>> candidate to modify.    Let me see the code more in detail.
>>>>>>>>
>>>>>>>> Thank you;
>>>>>>>> ---
>>>>>>>> Koichi
>>>>>>>> https://www.linkedin.com/in/koichidbms
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Pavan
>>>>>>>>> --
>>>>>>>>> Pavan Deolasee
>>>>>>>>> http://www.linkedin.com/in/pavandeolasee
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Postgres-x2-steering" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to postgres-x2-steering+unsubscribe at googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Pavan Deolasee
>>>> http://www.linkedin.com/in/pavandeolasee
>>>>
>>>
>>>
>>>
>>> --
>>> --
>>>  Pavan Deolasee                   http://www.2ndQuadrant.com/
>>>  PostgreSQL Development, 24x7 Support, Training & Services
>>>
>>
>>
>>
>> --
>> --
>>  Pavan Deolasee                   http://www.2ndQuadrant.com/
>>  PostgreSQL Development, 24x7 Support, Training & Services
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Postgres-x2-steering" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to postgres-x2-steering+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-developers-postgres-xl.org/attachments/20150803/9d6ad0bf/attachment.htm>


More information about the Postgres-xl-developers mailing list