[Postgres-xl-developers] [pgx2-steering] Re: X2/XL consistency violation

Koichi Suzuki koichi.dbms at gmail.com
Sun Aug 2 23:53:54 PDT 2015


Hi,

I raun Konstaintin's test with XC (REL1_2_STABLE, without any dedicated
fix) form more than an hour and didn't have any update conflict.   The
following is the result.

---8<--------------------------------8<------------------------
[koichi at everest:inconsisitent_update_test]$ date

2015年  8月  3日 月曜日 13:40:50 CST

[koichi at everest:inconsisitent_update_test]$ go run runtest.go

1 : 0 -> 496880297 xid = 950188 count = 10000

^Cexit status 2

[koichi at everest:inconsisitent_update_test]$ date

2015年  8月  3日 月曜日 14:41:32 CST

[koichi at everest:inconsisitent_update_test]$
--->8----------------------------------->8---------------------

I used the following parameters:

    TRANSFER_CONNECTIONS = 10

    BALANCE_CONNECTIONS  = 10

I was wondering if previous analysis is correct.   Pavan's analysis
suggests that TransactionIdDidCommit() may return true although the
transaction commit has not been reached to GTM and GXID is still in the
snapshot so that it is visible to other read transactions.

When I traced the code, I found that the target block has already been
locked with EXCLUSIVE mode so that other read transaction cannot read it
until the update has been done.

Then what is the candidate of the cause?   I noticed the conflict in
embedded function OIDs in src/include/catalog.   This conflict exists as
early as XC 1.1.   I found Pavan fixed this conflict in 9.5 merge, but not
sure if it has been fixed in the version Konstantin used.

XC's REL1_2_STABLE has this fix.   This conflict may end up with calling
wrong functions internally which can violate the lock and update wrong heap
tuple.

I think XL should review and test this.

Regards;

---
Koichi Suzuki
https://www.linkedin.com/in/koichidbms


2015-07-21 13:40 GMT+09:00 Pavan Deolasee <pavan.deolasee at gmail.com>:

> I have now checked in both these patches. I'm extremely sorry, but I
> forgot to add appropriate credits in the final commit messages. I'd changed
> the patches by hand to add those, but forgot to make changes in my local
> repo from where I pushed the changes. Will be careful in future.
>
> Thanks,
> Pavan
>
> On Mon, Jul 20, 2015 at 7:10 PM, Pavan Deolasee <pavan.deolasee at gmail.com>
> wrote:
>
>> Here are couple of patches to fix the MVCC issues reported by Konstantin.
>> The first patch just partially restores the old behaviour of obtaining an
>> XID before getting a snapshot. I don't think this is a long term fix and I
>> would like to work on a patch later which allows us to obtain a snapshot
>> without a global XID and somehow track that to arrive at the correct
>> RecentGlobalXmin calculation at the GTM. For now, it fixes the issue
>> reported by Konstantin.
>>
>> The second patch adds the ability to enforce logical ordering of
>> transaction commits at the GTM, when it matters. So if a transaction waits
>> for some other transaction to finish at a node, then GTM enforces the same
>> ordering by delaying commit of a transaction until it sees the other
>> transaction still in-progress.
>>
>> Konstantin's test case passes for me with both patches applied. I tested
>> with pgbench and no note-worthy drop in performance with both patches
>> applied. But if someone could run tests on a better hardware, that will
>> surely help.
>>
>> Barring objections, I'll check-in these fixes to XL.
>>
>> Thanks,
>> Pavan
>>
>> On Thu, Jul 16, 2015 at 12:23 PM, Pavan Deolasee <
>> pavan.deolasee at gmail.com> wrote:
>>
>>> Hello Suzuki-san,
>>>
>>> Can you please do some tests to verify the performance implications of
>>> the patch? I'm still curious about the additional synchronization points
>>> added at several place. For the record, I'm working on a patch based on the
>>> transaction commit ordering idea I explained above and would run tests once
>>> its ready and report.
>>>
>>> Thanks,
>>> Pavan
>>>
>>>
>>> On Thu, Jul 16, 2015 at 12:06 PM, Koichi Suzuki <koichi.dbms at gmail.com>
>>> wrote:
>>>
>>>> Hello;
>>>>
>>>> Here's a candidate of the fix for inconsistencies as Konstantin
>>>> reported.    Please note the following:
>>>>
>>>> 1.  This patch looks to work nicely with DML and query.   But reveals
>>>> potential deadlock of DDL, I mean you will see commit synchronization
>>>> failure only when DDLs are issued in parallel.   For the complete fix, we
>>>> need DDL serialization, which will fix the issue reported as schema
>>>> inconsistency.   So please do not blame that this patch makes regression
>>>> fail.   I'm also working with this DDL fix (it's separate cause) and would
>>>> like to share this.
>>>>
>>>> 2. The patch is tested with XC.   I believe this applies to XL as well
>>>> without significant modification.
>>>>
>>>> 3. This patch obtains fresh snapshot from GTM using independent GXID.
>>>> This is okay for datanodes or child transaction in coordinators.   Although
>>>> it looks working so far, I'm planning to improve GTM so that GTM ignore
>>>> such temporary GXID, or accept InvalidGlobalTransactionId and then give
>>>> just a fresh snapshot and ignore everything else.  This patch will be
>>>> posted as early as possible.
>>>>
>>>> 4. GUC name is not good so far.   I will change this.
>>>>
>>>> 5. Invisible tuple fix was not fixed.   It is related to XL-specific
>>>> recentXmin, which I believe Pavan is working with.
>>>>
>>>> As long as DDL is issued in series, this patch should work.    It will
>>>> be nice if Konstantin tries this.
>>>>
>>>> Best;
>>>>
>>>> ---
>>>> Koichi Suzuki
>>>> https://www.linkedin.com/in/koichidbms
>>>>
>>>>
>>>> 2015-07-14 17:29 GMT+09:00 Koichi Suzuki <koichi.dbms at gmail.com>:
>>>>
>>>>> Attached is very rough preliminary design of the code.   Please allow
>>>>> typo, incorrect names and so on, unless they are significant.  This is just
>>>>> to present a design idea and does not compile successfully.
>>>>>
>>>>> Also, please note that we need more careful review where GXID
>>>>> synchronization is needed in HeapTupleSatisfiesUpdate(), as indicated by
>>>>> /* Sync here */
>>>>>
>>>>> Regards;
>>>>>
>>>>> ---
>>>>> Koichi Suzuki
>>>>> https://www.linkedin.com/in/koichidbms
>>>>>
>>>>>
>>>>> 2015-07-14 14:39 GMT+09:00 Koichi Suzuki <koichi.dbms at gmail.com>:
>>>>>
>>>>>> Now I'm looking around the code where X2 wait until X1 reports its
>>>>>> commit to GTM.
>>>>>>
>>>>>> HeapTupleSatisfiesUpdate() in tqual.c looks good.   We need to wait
>>>>>> until X1 is not included in the fresh snapshot before we return
>>>>>> HeapTupleMayBeUpdated and HeapTupleUpadted.   I need some more time to get
>>>>>> more practical idea of the code.
>>>>>>
>>>>>> Regards;
>>>>>>
>>>>>> ---
>>>>>> Koichi Suzuki
>>>>>> https://www.linkedin.com/in/koichidbms
>>>>>>
>>>>>>
>>>>>> 2015-07-13 14:37 GMT+09:00 Koichi Suzuki <koichi.dbms at gmail.com>:
>>>>>>
>>>>>>> 2015-07-13 14:16 GMT+09:00 Pavan Deolasee <pavan.deolasee at gmail.com>
>>>>>>> :
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jul 13, 2015 at 7:17 AM, Koichi Suzuki <
>>>>>>>> koichi.dbms at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Considering very short duration until original transaction (X1)
>>>>>>>>> reports GTM that it is committed,  if X2 finds that X1 is still in the
>>>>>>>>> snapshot, X2 can ask GTM for new snapshot until X1 is not included in it.
>>>>>>>>>
>>>>>>>>
>>>>>>>> X2 can't really ask for a new snapshot. Its perfectly fine to work
>>>>>>>> with a snapshot that has a transaction included, but already committed. Or
>>>>>>>> are you suggesting that X2 waits for X1 to commit at the GTM instead of the
>>>>>>>> local node before proceeding with an UPDATE? i.e. we modify
>>>>>>>> XactLockTableWait() so that it waits for GTM events? I'm afraid that won't
>>>>>>>> be cheap either.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is the basic idea.   Why X2 cannot ask for a fresh
>>>>>>> snapshot?  GTM should accept such requirement and if we regard this
>>>>>>> snapshot as "temporary" one, just to check if X1 is reported committed to
>>>>>>> GTM,  and we still continue to use the old one for visibility check, I
>>>>>>> suppose this is feasible.    Anyway, if X2 is updating a record updated by
>>>>>>> X1 and if X1 is found in the snapshot, X2 should wait until X1 is reported
>>>>>>> as committed to GTM.    I'm not sure if XactLockTableWait() is only one
>>>>>>> candidate to modify.    Let me see the code more in detail.
>>>>>>>
>>>>>>> Thank you;
>>>>>>> ---
>>>>>>> Koichi
>>>>>>> https://www.linkedin.com/in/koichidbms
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Pavan
>>>>>>>> --
>>>>>>>> Pavan Deolasee
>>>>>>>> http://www.linkedin.com/in/pavandeolasee
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Postgres-x2-steering" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to postgres-x2-steering+unsubscribe at googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> --
>>> Pavan Deolasee
>>> http://www.linkedin.com/in/pavandeolasee
>>>
>>
>>
>>
>> --
>> --
>>  Pavan Deolasee                   http://www.2ndQuadrant.com/
>>  PostgreSQL Development, 24x7 Support, Training & Services
>>
>
>
>
> --
> --
>  Pavan Deolasee                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
> --
> You received this message because you are subscribed to the Google Groups
> "Postgres-x2-steering" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to postgres-x2-steering+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-developers-postgres-xl.org/attachments/20150803/adeefcb2/attachment.htm>


More information about the Postgres-xl-developers mailing list