This is continuation of viewtopic.php?f=10&p=245122. Short recap: if I did not missed something, variables: FLogMessage, FConn are subject to race condition when AbortOperation() is called from another thread. Even if presented analysis of source code is correct, such case will be extremely rare, and very difficult to detect, yet even harder to debug for eventual developer. There is also issue of HandleErrorOrWarning() call.
First of all, I need confirmation of drawn conclusions.
In the mean time, I searched for inspiration for possible fixes.
FConn:
PQcancel() do not mutate dereferenced content of pCancel. psycopg (Python wrapper around PostgreSQL) allocate PGCancel pointer right after FConn is created (note: I am paraphrasing, what C code of python package does in language of Zeos). We can do the same and surround PGCancel creation, potential usage and (its destruction with destruction of FConn) with mutex. FConn will not be used in AbortOperation() at all, and will not need mutex protection.
FLogMessage:
FLogMessage should be replaced with local variable.
HandleErrorOrWarning:
I do not deciphered HandleErrorOrWarning method yet, and it does a lot of things. I see 2 solutions:
1. surround it with reentrant mutex (same thread can enter it many times).
2. remove entirely, and let AbortOperation returns Integer with PG error message. <--- probably that option is unacceptable.
EDIT: Please tell me, that TZPostgreSQLConnection is not lazy-open. If it is, we have to tell the user to explicitly open db connection before any thread-related processing. Otherwise whole FConn section above looses its applicability.
PostgreSQL AbortOperation - race conditions?
-
- Platinum Boarder
- Posts: 1939
- Joined: 17.01.2011, 14:17
Re: PostgreSQL AbortOperation - race conditions?
Yes - you are correct. These functions and variables get used from a second thread, which kinda violates the thread safety model of Zeos. In theory this could lead to problems - although these will be very rare ocassions.Scalmax wrote: ↑10.07.2023, 11:28 This is continuation of viewtopic.php?f=10&p=245122. Short recap: if I did not missed something, variables: FLogMessage, FConn are subject to race condition when AbortOperation() is called from another thread. Even if presented analysis of source code is correct, such case will be extremely rare, and very difficult to detect, yet even harder to debug for eventual developer. There is also issue of HandleErrorOrWarning() call.
First of all, I need confirmation of drawn conclusions.
In the mean time, I searched for inspiration for possible fixes.
I think that could be done.Scalmax wrote: ↑10.07.2023, 11:28 FConn:
PQcancel() do not mutate dereferenced content of pCancel. psycopg (Python wrapper around PostgreSQL) allocate PGCancel pointer right after FConn is created (note: I am paraphrasing, what C code of python package does in language of Zeos). We can do the same and surround PGCancel creation, potential usage and (its destruction with destruction of FConn) with mutex. FConn will not be used in AbortOperation() at all, and will not need mutex protection.
We could do that.
I am not sure, if surreounding it with a critical section or things like that is worth it. The critical section could decrease performance if it gets called often. I did look up some places in ZDbcPostgreSqlStatement and there it seems that HandleErrorOrWarning only gets called if there is a problem. But if we do thgis change we really should check all places where it gets called and see if there already is something in place to keep it from getting called if everything is ok.Scalmax wrote: ↑10.07.2023, 11:28 HandleErrorOrWarning:
I do not deciphered HandleErrorOrWarning method yet, and it does a lot of things. I see 2 solutions:
1. surround it with reentrant mutex (same thread can enter it many times).
2. remove entirely, and let AbortOperation returns Integer with PG error message. <--- probably that option is unacceptable.
As far as I know, there is nothing lazy about Zeos - only about Zeos programmers
Re: PostgreSQL AbortOperation - race conditions?
The more rare the worse. Sorry, thats my opinion. If so it need to be fixed, because it can even lead to AV's, that can not be pinpointed with ease.In theory this could lead to problems - although these will be very rare ocassions.
When I read your responce, it came to me that maybe (thats my guess) Zeos could be (a little bit) sloppy on error handling. Error is not normal operation, that should be fast. Do Zeos have even a benchmark targeting error reporting itself?The critical section could decrease performance if it gets called often.
I saw those lines often:As far as I know, there is nothing lazy about Zeos
Code: Select all
if IsClosed then
Open;
EDIT: HandleErrorOrWarning() does use FConn. Everything that HandleErrorOrWarning() uses should be mutex protected - that means locking everything up. My solution no 1. was wrong. There is 3'rd option: put raw error byte code (variable named 'P') in some mutex protected queue, and read it in a proper thread. I hate multithreading.
-
- Platinum Boarder
- Posts: 1939
- Joined: 17.01.2011, 14:17
Re: PostgreSQL AbortOperation - race conditions?
You are welcome to fix this
As far as I know, we don't have official benchmarks. We do have a test suite. Zeos isn't sloppy about error handling. One has to look how HandleErrorOrWarning is used throughout the codebase. The placed that I looked at only call HandleErrorOrWarning if there is an actual error situation. So using a TCriticalSection object there shouldn't be a (performance) problem there. But the HandleErrorOrWarning is written in a way where it can be called after each action without testing if there is an error situation. So there could be places (which I didn't find) that call HandleErrorOrWarning in all cases - even if there is no actual error to handle.Scalmax wrote: ↑17.07.2023, 13:20When I read your responce, it came to me that maybe (thats my guess) Zeos could be (a little bit) sloppy on error handling. Error is not normal operation, that should be fast. Do Zeos have even a benchmark targeting error reporting itself?The critical section could decrease performance if it gets called often.
You are welcome to help.Scalmax wrote: ↑17.07.2023, 13:20I saw those lines often:As far as I know, there is nothing lazy about ZeosAnd hey, I am lazy too. Then maybe I can even help with something .Code: Select all
if IsClosed then Open;
No - please don't do multi threading in Zeos if it isn't absolutely necessary. It should be enough if HandleErrorOrWarning internally uses a TCriticalSection. There is no legal way that any other Zeos code using any ressources from a single PG connection runs in a separate thread. That simply isn't allowed with Zeos.Scalmax wrote: ↑17.07.2023, 13:20 EDIT: HandleErrorOrWarning() does use FConn. Everything that HandleErrorOrWarning() uses should be mutex protected - that means locking everything up. My solution no 1. was wrong. There is 3'rd option: put raw error byte code (variable named 'P') in some mutex protected queue, and read it in a proper thread. I hate multithreading.
Re: PostgreSQL AbortOperation - race conditions?
Will do, and consult it thoroughly with you to the best of my abilities.You are welcome to fix this
No - please don't do multi threading in Zeos if it isn't absolutely necessary
It is difficult to find another way for doing this. Call to HandleErrorOrWarning() touches almost everything. It will take long hours to look at code and come up with something better. Maybe even a lock-free solution? Then, it will have to checked under lazarus and C++ Builder 6 (for locking and atomics primitives) and tested by someone who has access to test environments. I will try my best.
By the way: async Query operation means async error handling as well. HandleErrorOrWarning probably must become thread safe one way or another, and in every backend.