Page 1 of 2

Asynchronous working DataSets?

Posted: 10.07.2021, 11:26
by marsupilami
Hello,

this is thread meant to collect some ideas. When working on an app for Android using Zeos and the Webservice driver, I noticed that yes - the app works and it works fast enough but it isn't as nice as I would it like to be. So I started wondering what changes would be necessary to get Zeos to do its work of fetching data outside the main thread and send events when data is available. Something like this:

Image

The idea is to keep the GUI responsive while Zeos is waiting for data or processing large datasets. Also this could allow other background threads to use the same database connection to get their work done. Currently this kind of processing is not easily possible because Zeos as a whole is not thread safe. Creating the necessary thread safety is not easy for users and I would like for Zeos to make that kind of data processing possible without requiring each user to implement its own scheme for doing this. What are your opinions? Is this something that we should consider doing?

If so how could that be done? My current idea would be to wrap up most of the DBC layer into the worker thread because otherwise we would have to make all drivers and their components thread safe? Maybe we need something like the pooled driver that works a s a proxy but only with focus on thread safety? Or would we need to implement a new set of components (connection, datasets and the like)? Opinions?

Best regards,

Jan

Re: Asynchronous working DataSets?

Posted: 10.07.2021, 14:53
by aehimself
Jan,

My program was using ZQueries in BG threads for at least a year now. The initial version was ZMethodInThread which I replaced with full-fledged TThread descendants to open, refresh, post, connect, disconnect, etc.
If I keep DB aware components disconnected until this is done, it works like a charm with small fixes here and there (call .FetchAll after .Open and .Refresh in the thread as if you want to display .RecordCount for example it will fetch all data in the GUI thread).

This unfortunately does not apply if you have events (AfterOpen, OnScroll) or master-detail relations set up as those will be called from the worker's context. You can use TThread.Queue in event handlers but I think master-detail will likely crash.

I really would like to see an async Zeos so I can throw my hacky-whacky solution out. Especially because in some occasions these do not even apply (e.g. simply resetting the sorting to none calls an internal refresh - can't push that out to a thread).

Re: Asynchronous working DataSets?

Posted: 16.07.2021, 14:30
by marsupilami
aehimself wrote: 10.07.2021, 14:53 which I replaced with full-fledged TThread descendants to open, refresh, post, connect, disconnect, etc.
My idea is to only get open done asynchronously. Maybe connect and disconnect. All others most probably will be kept to be synchronous because I don't think it will work good with the UI. I think it clould be strange for a DBGrid or a DBNavigator to think a record is posted while in reality it isn't. Maybe ApplyUpdates could be done in the background too....

What do you think?

Re: Asynchronous working DataSets?

Posted: 16.07.2021, 16:51
by aehimself
marsupilami wrote: 16.07.2021, 14:30My idea is to only get open done asynchronously. Maybe connect and disconnect All others most probably will be kept to be synchronous because I don't think it will work good with the UI.
Unfortunately that's not always the case. In very large datasets (millions of records, 80-100-ish columns) posting can take significant time also. Leaving these synchronous will freeze the UI up again - which is not very good :)
marsupilami wrote: 16.07.2021, 14:30I think it clould be strange for a DBGrid or a DBNavigator to think a record is posted while in reality it isn't. Maybe ApplyUpdates could be done in the background too....

What do you think?
The DBGrid (nor any other DB-aware control) will notice a thing if we internally call .DisableCon0trols before and .EnableControls after. We even could override DoBeforeOpen and DoAfterOpen to call these automatically IF we are running async.
On the other hand, working with asynchronous components it's the responsibility of the developer to disable parts of the UI until a specific action is completed which can mess up the ongoing process. I'd say not to overcomplicate it: lay out the foundations (make sure all functions work async) and leave the rest to the consumer.

As for ApplyUpdates - I'm not using it for a while now (I just use .Post and use transactions) - does it take significant amount of time? As it is equal to .Post with .AutoCommit I suppose yes, it *might*; in this case it would make sense to push that to the background too.

Re: Asynchronous working DataSets?

Posted: 17.07.2021, 11:04
by marsupilami
Off topic but maybe a good question:
aehimself wrote: 10.07.2021, 14:53 My program was using ZQueries in BG threads for at least a year now. The initial version was ZMethodInThread which I replaced with full-fledged TThread descendants to open, refresh, post, connect, disconnect, etc.
If you don't use ZMethodInThread anymore - does it make sense to distribute it with Zeos?

Re: Asynchronous working DataSets?

Posted: 17.07.2021, 17:58
by aehimself
marsupilami wrote: 17.07.2021, 11:04If you don't use ZMethodInThread anymore - does it make sense to distribute it with Zeos?
Well, it shouldn't be bundled because of me but because of the other users :) But maybe it would be better to move it somewhere where samples reside.

The original ZMethodInThread works and shows that it's possible to push long-lasting operations to the background (therefore .AbortOperation was born) but it's more of an example still. It could be written better and with more care.

Re: Asynchronous working DataSets?

Posted: 27.07.2021, 17:22
by Fr0sT
AFAIU FireDAC from Delphi's std lib supports async, some ideas could be borrowed there. However, any semi-async implementation (as we know, there's no native language support of async operations) could be a problem source instead of a handy tool if a user is not doing it right. F.ex.:

Dataset.OpenAsync;
Dataset.FetchAll; // BADABOOM

If an internal result set could be hidden from such misuses - that would be fine. Otherwise it's better to populate things like RunOperationInThread where a user is supposed to know what he's doing

Re: Asynchronous working DataSets?

Posted: 27.07.2021, 18:15
by aehimself
Nice to see you back, Fr0st!
Fr0sT wrote: 27.07.2021, 17:22Otherwise it's better to populate things like RunOperationInThread where a user is supposed to know what he's doing
This is exactly how I am doing it with threads at the moment. I launch the thread which opens the dataset / posts changes / whatever and with the thread finish callback I can finalize the UI state depending if the execution was successful or not.
Fr0sT wrote: 27.07.2021, 17:22Dataset.OpenAsync;
Dataset.FetchAll; // BADABOOM
Here's the catch. My workers which .Open or .Refresh a dataset always call .FetchAll right after. The reason behind is internally Zeos fetches all records at multiple places (e.g. .RecordCount) so without it my worker finished in 2 sec but the application was frozen for 20 :D With this included, the thread runs for 19 sec and the UI is responsive.
So if we decide to introduce .Async, we must call .FetchAll too (with the exception of unidirectional datasets) or it won't make too much sense.

What we also have to redo is sorting. Setting .SortType to stIgnored will cause the dataset to refresh. This is why I still have this junk in my custom DBGrid implementation :D

Code: Select all

Procedure TDBGrid.TitleClick(Column: TColumn);
Var
  dataset: TZAbstractRODataset;
Begin
  Try
    dataset := GetDataSet;
    
    If Not Assigned(dataset) Or (_columnmoved) Then
      Exit;

    If (dataset.SortedFields = Column.FieldName) Then
      Case dataset.SortType Of
        stAscending: Self.SortBy(Column.FieldName, stDescending);
//        stDescending: Self.SortBy(Column, stIgnored); // stIgnored causes the dataset to refresh, freezing the application
        Else Self.SortBy(Column.FieldName, stAscending);
      End
    Else
      Self.SortBy(Column.FieldName, stAscending);
 Finally
   _columnmoved := False;
 End;
End;

Re: Asynchronous working DataSets?

Posted: 28.07.2021, 09:26
by Fr0sT
Glad to see you too :)
aehimself wrote: 27.07.2021, 18:15 Here's the catch. My workers which .Open or .Refresh a dataset always call .FetchAll right after. The reason behind is internally Zeos fetches all records at multiple places (e.g. .RecordCount) so without it my worker finished in 2 sec but the application was frozen for 20 :D With this included, the thread runs for 19 sec and the UI is responsive.
So if we decide to introduce .Async, we must call .FetchAll too (with the exception of unidirectional datasets) or it won't make too much sense.
Yes, here's the catch! Because a user could never wish to call .FetchAll (unidirectional DS with billions of records, or he could want to load records in background, or just when they're needed, or whatever). So we already found a point of controversy.
Anyway adding thread safety is huge PITA - all methods should either lock or check if there's some operation running currently. Some ideas could be borrowed from FireDAC which has this mode built-in but lots of work still required.

Btw, I couldn't find any SortBy method in DBGrid even in 10.3

Re: Asynchronous working DataSets?

Posted: 28.07.2021, 10:17
by marsupilami
Fr0sT wrote: 27.07.2021, 17:22 Dataset.OpenAsync;
Dataset.FetchAll; // BADABOOM
aehimself wrote: 27.07.2021, 18:15 Here's the catch. My workers which .Open or .Refresh a dataset always call .FetchAll right after.
My current idea is geard towards applications that don't fetch millions of records. It is meant to be used with applications that want to be able to keep processing their GUI events while waiting for (all) data to arrive. This can either be due to a high latency network connection or due to the fact that they want to open several queries at the same time. Like for loading lists from the database or things like that.
So the idea is more or less to
  1. push the SQL statement and its parameters to a worker thread. Keep processing GUI Events.
  2. The worker thread fetches all data and loads it into memory. Let's say TZCachedResultset.
  3. When the loading is finished the TZCachedResultSet and its metadata get sent to the main thread that can do all the .Open stuff like generating FieldDefs, Fields and the like.
So a FetchAll operation would become an empty operation. One possibly could raise an exception in Fr0sts example, because no operations on the dataset are allowed after OpenAsync. All operations on the DataSet would have to be moved to the AfterOpen event.
Note: My idea is not to make all drivers threadsafe. This would be a really huge task, that I am not ready to take on. My idea would be to have a kind of wrapper that wrap up the wholde dbc level stuff in a single worker thread and only exposes thread safe calls to interact with that thread. All data that this wrapper returns is meant to have no links to the internal data of the wrapper.
Fr0sT wrote: 28.07.2021, 09:26 Yes, here's the catch! Because a user could never wish to call .FetchAll (unidirectional DS with billions of records, or he could want to load records in background, or just when they're needed, or whatever). So we already found a point of controversy.
Hmm - this is simple. My idea simply doesn't support this because I have no clue on how to combine forward only datasets with asynchronous, thread safe operations. Also I am not sure if this is a use case that we should optimize for.
In a use case like this it might make more sense for the user to write his own thread that gets its own database conenction. One more point: As far as I remember from things Egonhugeist told me, most databases require us to finish one (select) operation before we do another operation when using their asynchronous APIs. Hiding all this seems quite complex right now.

Re: Asynchronous working DataSets?

Posted: 28.07.2021, 10:59
by aehimself
Fr0sT wrote: 28.07.2021, 09:26Because a user could never wish to call .FetchAll (unidirectional DS with billions of records, or he could want to load records in background, or just when they're needed, or whatever)
Ehm :)
aehimself wrote: 27.07.2021, 18:15(with the exception of unidirectional datasets)
Fr0sT wrote: 28.07.2021, 09:26Btw, I couldn't find any SortBy method in DBGrid even in 10.3
Yes, it's my own implementation. It simply sets the dataset's SortedFields and SortDirection properties and updates the field's DisplayLabel to include the arrow :)
marsupilami wrote: 28.07.2021, 10:17Hmm - this is simple. My idea simply doesn't support this because I have no clue on how to combine forward only datasets with asynchronous, thread safe operations.
I am using the same worker to open all datasets, regardless if it's unidirectional or not and it works. The only difference is that .FetchAll is not called as it positions the cursor to the end and .First is not allowed, see this topic.
The code is extremely simple:

Code: Select all

Procedure TOpenThread.InternalExecute;
Begin
 inherited;

 Self.SetStatus(_operation);

 Try
   Self.DataSet.Open;

   // We also have to call .FetchAll to make sure everything is downloaded. Not doing this will cause the application to fetch all data even
   // if we just want to access .RecordCount.
   If (Self.DataSet Is TZAbstractRODataSet) And Not (Self.DataSet As TZAbstractRODataSet).IsUniDirectional Then
     Self.DataSet.FetchAll;
 Except
   On E:Exception Do
   Begin
     If Self.DataSet.Active Then Self.DataSet.Close;
     Raise;
   End;
 End;
End;
So either I'm not understanding your concerns correctly (due to lack of insight, probably) or I don't see why the same way wouldn't work.

Re: Asynchronous working DataSets?

Posted: 28.07.2021, 15:56
by marsupilami
aehimself wrote: 28.07.2021, 10:59 So either I'm not understanding your concerns correctly (due to lack of insight, probably) or I don't see why the same way wouldn't work.
Maybe I imagine things here. The following might be true for one or more APIs - but it very well might not be true for all APIs. It is however a combination of charachteristics that I remember from different documentations. So here are my assumptions:
  • I assume for that kind of API to have a function to execute some SQL and return a handle for a forward only result set.
  • I assume that we can get one record at a time. The vendor of the database can decide wether to transfer result set records one by one or in batches.
  • I assume that the vendeor doesn't guarantee thread safety for that kind of operations. I do know that dblib doesn't:
    FreeTDS FAQ wrote:Is FreeTDS thread safe?

    Different threads may all use separate connections without interfering with each other. Threads may not share a DBPROCESS or CS_CONNECTION without controlling access via a mutex.
So - If we don't do a fetchall and leave the decison about when to load data to the dbc layer as it is, we may very well run into a situation where two threads try to access the same connection at the same time. Assume the following:
  • Thread 1 executes a compilcated SQL and waits for it to return a value.
  • During this time Thread 2 tries to access the next record of a result ser and the client library needs to fetch that record from the server.
In a situation like this the second thread might try to access internal client library ressources, that the first thread still needs. At the minimum this might be the TCP socket where Thread 1 waits for the server to return its result and maybe just that Socket isn't protected by a mutex because the client library explicitly isn't designed in a thread safe manner?
Side note: Oracle has an option to specify wether one intends to use a connection with multiple threads. So they also do have a mode of operation for their client library that isn't thread safe.

Re: Asynchronous working DataSets?

Posted: 29.07.2021, 08:45
by Fr0sT
marsupilami wrote: 28.07.2021, 15:56
FreeTDS FAQ wrote:Is FreeTDS thread safe?

Different threads may all use separate connections without interfering with each other. Threads may not share a DBPROCESS or CS_CONNECTION without controlling access via a mutex.
So - to support async operations reliably there must be a flag in connection object that rejects any actions until async operation is completed.

Re: Asynchronous working DataSets?

Posted: 30.07.2021, 09:11
by marsupilami
Fr0sT wrote: 29.07.2021, 08:45 So - to support async operations reliably there must be a flag in connection object that rejects any actions until async operation is completed.
Yes and no. I intend to implement a work queue in the thread where Datasets and TZConnection can add work that should be executed. Assume the following code:

Code: Select all

Query1.OpenAsync;
Query2.OpenAsync;
Query3.OpenAsync;
The idea is that each OpenAsync will add its SQL statement to the work queue. The internal worker thread works through this queue and notifies each of the queries when their result is available. Even the following code would be legal:

Code: Select all

Query1.Open;
Query2.OpenAsync;
Query3.OpenAsync;
Here Open also would use the work queue but the synchronization would be done differently so that Open effectively waits for its result - which kinda contradicts everything. But it would be possible and legal.

Re: Asynchronous working DataSets?

Posted: 07.07.2023, 18:15
by Scalmax
Any async code inside Zeos will add much of complicated code and create classes, that do many things. Clearly, it is not a good design.
Have a look at APL in TComponent. Or System.Net.Socket.TSocket. Begin*() all over the code. What for? ZMethodInThread is good enough for start.
In this context TZMemTable is a blessing: run query in background, as a result copy it in sync manner to TZMemTable.

(Warning: personal context)
I personally think there is no other way around. For 2 years I am developing an app with Delphi PPL. I want to control my own threads, and taking that control away makes library hard to use in context of what I am doing. Example: I had to dig & hack Indy (a nice library on its own) to use it, because it hides server threads.
(End of Warning)

In the end, people will use their own threads, just give them ability to thread-safely AbortOperation(). The real question is: can we call AbortOperation() from another thread using the same connection. In TZPostgreSQLConnection implementation it uses: FConn, FLogMessage, HandleErrorOrWarning(). Or asking another way: what operations of zeos classes that use Connection can affect those fields/method? My small knowlegle of Zeos and gut feeling tells me that those 2 questions are apt for another forum thread (with pun intended).

Async queue'ing is already done by external classes/libraries/frameworks. They are available for delphi.

EDIT:
FLogMessage is used all over the TZPostgreSQLConnection.
TZPostgreSQLConnection.InternalClose does call FPlainDriver.PQFinish(Fconn). Docustring for InternalClose says
`A Connection is automatically closed when it is garbage collected. Certain fatal errors also result in a closed Connection.`
https://www.postgresql.org/docs/current ... Q-PQFINISH says `Closes the connection to the server. Also frees memory used by the PGconn object`.