Thoughts about TZCachedResultSet implementation
Posted: 13.11.2015, 12:06
Why TZCachedResultSet exist
===========================
DB native client (library) gets data from the DB server in its own format and
places them in a continuous memory block. Client application (Zeoslib) gets pointer
to this memory block. App could parse and show data from this memory area,
but couldn't modify it in place since app couldn't manipulate memory block allocated by DB client's code.
DB client may have (or may not) functions to insert/update/delete rows directly in received data.
For example libpq have not such functions: all that can be done is
free memory block by calling libpq's function 'PQclear'.
Here arises a TZCachedResultSet. Its copy data from memory block
in its own single format and all manipulations on data occurs on this second copy.
So, after all rows copied by TZCachedResultSet, native data not needed any more.
How current TZCachedResultSet work
==================================
If statement's concurrency is 'rcUpdatable' 'TZCachedResultSet' created and it copy rows from native
resultset (NRS) into memory and 'IZResultSet' returned to manage this data copy.
Copying occurs automatically if we move to some row: copied records that are not in 'TZCachedResultSet' yet.
So, for example, if we move to last row all rows from NRS will be copied.
Data copy occupy more memory then NRS since native format (at least for Postgres)
is more compact and continuous.
As a result, in worst case (when all rows copied) result memory footprint doubles at least:
memory footprint of NRS + memory footprint of CRS.
Alternative TZCachedResultSet implementation
============================================
Alternative implementation is possible for TZCachedResultSet.
It could combine NRS (FNativeResultSet on picture) with slightly modifed TZCachedResultSet implementation (FCache on picture)via additional indirection layer.
New implementation maps rows to NRS or FCache via its own list of rows:
- unmodified rows are mapped directly to corresponding NRS rows;
- a new row is added and mapped directly into FCache ;
- before updating NRS row is copied into FCache , mapped into new FCache location
and update occurs on this copy;
- before deleting NRS row is copied into FCache , mapped into new FCache location
and deleted from FCache and FRowList;
- rows in FCache updated/deleted as in original 'TZAbstractCachedResultSet'.
All operations on rows are mapped to corresponding NRS/FCache class methods.
Posting postponed updates to server should work as is.
There is one challenge: 'CompareRows' function.
Оriginal function compares rows in the same format: 'TZRowBuffer'.
In alternative implementations there are three cases:
- TZRowBuffer <=> TZRowBuffer;
- NRS row <=> TZRowBuffer;
- NRS row <=> NRS row.
So for compare function I convert NRS row to TZRowBuffer on the fly but it was slow down
comparing about 20 times (I compare first row with other 37,000 rows 40 times).
I had no choice but to cache converted rows wich effictivle means permanently
copying them from NRS to FCache.
And this this break initial assumptions about handling row copies...
However new implementation has significantly less memory footprint, but with
one exception: sorting. When sorting occurs compared rows are copied into FCache
and memory footprint grows.
Also (and this is expected behavior) each updated/deleted NRS row is copied into FCache and increases memory footprint.
Now I have working implementation based on 7.1.4-stable and I guess I can do it for 7.2.
===========================
DB native client (library) gets data from the DB server in its own format and
places them in a continuous memory block. Client application (Zeoslib) gets pointer
to this memory block. App could parse and show data from this memory area,
but couldn't modify it in place since app couldn't manipulate memory block allocated by DB client's code.
DB client may have (or may not) functions to insert/update/delete rows directly in received data.
For example libpq have not such functions: all that can be done is
free memory block by calling libpq's function 'PQclear'.
Here arises a TZCachedResultSet. Its copy data from memory block
in its own single format and all manipulations on data occurs on this second copy.
So, after all rows copied by TZCachedResultSet, native data not needed any more.
How current TZCachedResultSet work
==================================
If statement's concurrency is 'rcUpdatable' 'TZCachedResultSet' created and it copy rows from native
resultset (NRS) into memory and 'IZResultSet' returned to manage this data copy.
Copying occurs automatically if we move to some row: copied records that are not in 'TZCachedResultSet' yet.
So, for example, if we move to last row all rows from NRS will be copied.
Data copy occupy more memory then NRS since native format (at least for Postgres)
is more compact and continuous.
As a result, in worst case (when all rows copied) result memory footprint doubles at least:
memory footprint of NRS + memory footprint of CRS.
Alternative TZCachedResultSet implementation
============================================
Alternative implementation is possible for TZCachedResultSet.
It could combine NRS (FNativeResultSet on picture) with slightly modifed TZCachedResultSet implementation (FCache on picture)via additional indirection layer.
New implementation maps rows to NRS or FCache via its own list of rows:
- unmodified rows are mapped directly to corresponding NRS rows;
- a new row is added and mapped directly into FCache ;
- before updating NRS row is copied into FCache , mapped into new FCache location
and update occurs on this copy;
- before deleting NRS row is copied into FCache , mapped into new FCache location
and deleted from FCache and FRowList;
- rows in FCache updated/deleted as in original 'TZAbstractCachedResultSet'.
All operations on rows are mapped to corresponding NRS/FCache class methods.
Posting postponed updates to server should work as is.
There is one challenge: 'CompareRows' function.
Оriginal function compares rows in the same format: 'TZRowBuffer'.
In alternative implementations there are three cases:
- TZRowBuffer <=> TZRowBuffer;
- NRS row <=> TZRowBuffer;
- NRS row <=> NRS row.
So for compare function I convert NRS row to TZRowBuffer on the fly but it was slow down
comparing about 20 times (I compare first row with other 37,000 rows 40 times).
I had no choice but to cache converted rows wich effictivle means permanently
copying them from NRS to FCache.
And this this break initial assumptions about handling row copies...
However new implementation has significantly less memory footprint, but with
one exception: sorting. When sorting occurs compared rows are copied into FCache
and memory footprint grows.
Also (and this is expected behavior) each updated/deleted NRS row is copied into FCache and increases memory footprint.
Now I have working implementation based on 7.1.4-stable and I guess I can do it for 7.2.