PostgreSQL AbortOperation - race conditions?

The offical for ZeosLib 7.3 Report problems, ask for help, post proposals for the new version of Zeoslib 7.3/v8
Quick Info:
-We made two new drivers: odbc(raw and unicode version) and oledb
-GUID domain/field-defined support for FB
-extended error infos of Firebird
-performance ups are still in queue
In future some more feature will arrive, so stay tuned and don't hassitate to help
Post Reply
Scalmax
Fresh Boarder
Fresh Boarder
Posts: 13
Joined: 04.05.2023, 15:16

PostgreSQL AbortOperation - race conditions?

Post by Scalmax »

This is continuation of viewtopic.php?f=10&p=245122. Short recap: if I did not missed something, variables: FLogMessage, FConn are subject to race condition when AbortOperation() is called from another thread. Even if presented analysis of source code is correct, such case will be extremely rare, and very difficult to detect, yet even harder to debug for eventual developer. There is also issue of HandleErrorOrWarning() call.
First of all, I need confirmation of drawn conclusions.
In the mean time, I searched for inspiration for possible fixes.

FConn:
PQcancel() do not mutate dereferenced content of pCancel. psycopg (Python wrapper around PostgreSQL) allocate PGCancel pointer right after FConn is created (note: I am paraphrasing, what C code of python package does in language of Zeos). We can do the same and surround PGCancel creation, potential usage and (its destruction with destruction of FConn) with mutex. FConn will not be used in AbortOperation() at all, and will not need mutex protection.

FLogMessage:
FLogMessage should be replaced with local variable.

HandleErrorOrWarning:
I do not deciphered HandleErrorOrWarning method yet, and it does a lot of things. I see 2 solutions:
1. surround it with reentrant mutex (same thread can enter it many times).
2. remove entirely, and let AbortOperation returns Integer with PG error message. <--- probably that option is unacceptable.

EDIT: Please tell me, that TZPostgreSQLConnection is not lazy-open. If it is, we have to tell the user to explicitly open db connection before any thread-related processing. Otherwise whole FConn section above looses its applicability.
marsupilami
Platinum Boarder
Platinum Boarder
Posts: 1918
Joined: 17.01.2011, 14:17

Re: PostgreSQL AbortOperation - race conditions?

Post by marsupilami »

Scalmax wrote: 10.07.2023, 11:28 This is continuation of viewtopic.php?f=10&p=245122. Short recap: if I did not missed something, variables: FLogMessage, FConn are subject to race condition when AbortOperation() is called from another thread. Even if presented analysis of source code is correct, such case will be extremely rare, and very difficult to detect, yet even harder to debug for eventual developer. There is also issue of HandleErrorOrWarning() call.
First of all, I need confirmation of drawn conclusions.
In the mean time, I searched for inspiration for possible fixes.
Yes - you are correct. These functions and variables get used from a second thread, which kinda violates the thread safety model of Zeos. In theory this could lead to problems - although these will be very rare ocassions.
Scalmax wrote: 10.07.2023, 11:28 FConn:
PQcancel() do not mutate dereferenced content of pCancel. psycopg (Python wrapper around PostgreSQL) allocate PGCancel pointer right after FConn is created (note: I am paraphrasing, what C code of python package does in language of Zeos). We can do the same and surround PGCancel creation, potential usage and (its destruction with destruction of FConn) with mutex. FConn will not be used in AbortOperation() at all, and will not need mutex protection.
I think that could be done.
Scalmax wrote: 10.07.2023, 11:28 FLogMessage:
FLogMessage should be replaced with local variable.
We could do that.
Scalmax wrote: 10.07.2023, 11:28 HandleErrorOrWarning:
I do not deciphered HandleErrorOrWarning method yet, and it does a lot of things. I see 2 solutions:
1. surround it with reentrant mutex (same thread can enter it many times).
2. remove entirely, and let AbortOperation returns Integer with PG error message. <--- probably that option is unacceptable.
I am not sure, if surreounding it with a critical section or things like that is worth it. The critical section could decrease performance if it gets called often. I did look up some places in ZDbcPostgreSqlStatement and there it seems that HandleErrorOrWarning only gets called if there is a problem. But if we do thgis change we really should check all places where it gets called and see if there already is something in place to keep it from getting called if everything is ok.
Scalmax wrote: 10.07.2023, 11:28 EDIT: Please tell me, that TZPostgreSQLConnection is not lazy-open. If it is, we have to tell the user to explicitly open db connection before any thread-related processing. Otherwise whole FConn section above looses its applicability.
As far as I know, there is nothing lazy about Zeos - only about Zeos programmers ;)
Scalmax
Fresh Boarder
Fresh Boarder
Posts: 13
Joined: 04.05.2023, 15:16

Re: PostgreSQL AbortOperation - race conditions?

Post by Scalmax »

In theory this could lead to problems - although these will be very rare ocassions.
The more rare the worse. Sorry, thats my opinion. If so it need to be fixed, because it can even lead to AV's, that can not be pinpointed with ease.
The critical section could decrease performance if it gets called often.
When I read your responce, it came to me that maybe (thats my guess) Zeos could be (a little bit) sloppy on error handling. Error is not normal operation, that should be fast. Do Zeos have even a benchmark targeting error reporting itself?
As far as I know, there is nothing lazy about Zeos
I saw those lines often:

Code: Select all

  if IsClosed then
     Open;
And hey, I am lazy too. Then maybe I can even help with something :).

EDIT: HandleErrorOrWarning() does use FConn. Everything that HandleErrorOrWarning() uses should be mutex protected - that means locking everything up. My solution no 1. was wrong. There is 3'rd option: put raw error byte code (variable named 'P') in some mutex protected queue, and read it in a proper thread. I hate multithreading.
marsupilami
Platinum Boarder
Platinum Boarder
Posts: 1918
Joined: 17.01.2011, 14:17

Re: PostgreSQL AbortOperation - race conditions?

Post by marsupilami »

Scalmax wrote: 17.07.2023, 13:20
In theory this could lead to problems - although these will be very rare ocassions.
The more rare the worse. Sorry, thats my opinion. If so it need to be fixed, because it can even lead to AV's, that can not be pinpointed with ease.
You are welcome to fix this :)
Scalmax wrote: 17.07.2023, 13:20
The critical section could decrease performance if it gets called often.
When I read your responce, it came to me that maybe (thats my guess) Zeos could be (a little bit) sloppy on error handling. Error is not normal operation, that should be fast. Do Zeos have even a benchmark targeting error reporting itself?
As far as I know, we don't have official benchmarks. We do have a test suite. Zeos isn't sloppy about error handling. One has to look how HandleErrorOrWarning is used throughout the codebase. The placed that I looked at only call HandleErrorOrWarning if there is an actual error situation. So using a TCriticalSection object there shouldn't be a (performance) problem there. But the HandleErrorOrWarning is written in a way where it can be called after each action without testing if there is an error situation. So there could be places (which I didn't find) that call HandleErrorOrWarning in all cases - even if there is no actual error to handle.
Scalmax wrote: 17.07.2023, 13:20
As far as I know, there is nothing lazy about Zeos
I saw those lines often:

Code: Select all

  if IsClosed then
     Open;
And hey, I am lazy too. Then maybe I can even help with something :).
You are welcome to help.
Scalmax wrote: 17.07.2023, 13:20 EDIT: HandleErrorOrWarning() does use FConn. Everything that HandleErrorOrWarning() uses should be mutex protected - that means locking everything up. My solution no 1. was wrong. There is 3'rd option: put raw error byte code (variable named 'P') in some mutex protected queue, and read it in a proper thread. I hate multithreading.
No - please don't do multi threading in Zeos if it isn't absolutely necessary. It should be enough if HandleErrorOrWarning internally uses a TCriticalSection. There is no legal way that any other Zeos code using any ressources from a single PG connection runs in a separate thread. That simply isn't allowed with Zeos.
Scalmax
Fresh Boarder
Fresh Boarder
Posts: 13
Joined: 04.05.2023, 15:16

Re: PostgreSQL AbortOperation - race conditions?

Post by Scalmax »

You are welcome to fix this :)
Will do, and consult it thoroughly with you to the best of my abilities.
No - please don't do multi threading in Zeos if it isn't absolutely necessary

It is difficult to find another way for doing this. Call to HandleErrorOrWarning() touches almost everything. It will take long hours to look at code and come up with something better. Maybe even a lock-free solution? Then, it will have to checked under lazarus and C++ Builder 6 (for locking and atomics primitives) and tested by someone who has access to test environments. I will try my best.

By the way: async Query operation means async error handling as well. HandleErrorOrWarning probably must become thread safe one way or another, and in every backend.
Post Reply