When the ZEOS 7 new version for Delphi 2010 ?

The alpha/beta tester's forum for ZeosLib 7.0.x series

Report problems concerning our Delphi 2009+ version and new Zeoslib 7.0 features here.

This is a forum that will be removed once the 7.X version goes into stable!!

Moderators: gto, EgonHugeist, olehs

ab
Fresh Boarder
Fresh Boarder
Posts: 16
Joined: 26.10.2010, 09:16
Location: France
Contact:

Post by ab »

Still working on it, but on evenings/nights only, because I'm not paid for it!

I have initiated the ZDBC part.
Still some properties to add, in order to circumvent the paradox of having UTF-8 encoding at Delphi level, and both stUnicodeString+stString at database driver level. We'll definitively need to set a CharSet/CodePage encoding for the stString fields, for every database driver - or perhaps at table level (some database engine allow this at table level... what do you think about that?).
I hope this will be the only main modification to the framework (all what I've done up to now is "just" replacing all string to RawUTF8, and get rid of the AnsiString/UnicodeString concept, then cleaning the code).

Then, at the components (TDataSet) level, we'll find back our good old "string" type, either AnsiString (for Delphi up to 2007), either UnicodeString (for Delphi 2009 up to XE).

I've made also some code refactoring, for faster code, in some places. Trying to avoid any regression issue.

I will upload things before Friday evening (Europe time zone).

Every suggestion/remark/feedback is welcome!
gto
Zeos Dev Team
Zeos Dev Team
Posts: 278
Joined: 11.11.2005, 18:35
Location: Porto Alegre / Brasil

Post by gto »

That's perfect !

I'm planning to work at this weekend in the code to make some speed and memory usage improvements in DBC level, and I'll be really pleased to do it using your already excellent modifications.

About DBC encoding, there are some DBs which define at column level, so we get a default database encoding, a table encoding, a column encoding, a client connection encoding AND resultset encoding override.

Why don't create a TZEncoding set, fill up with some encodings/codepage data and persist it to columns definition and resultsets? It will seriously help in detection and conversion of strings, as we can map Zeos encoding to DB-specific ones to convert and detect which of them are supported, and even expose a property with it in TZConnection and TZQuery. Perhaps, just an idea!

And finally, don't fell pressed ! We're all thankful with your contribution, but as you said, nobody earns money doing code for Zeos (We earn experience, friendship and visibility, though). Also, the code gets better and runs nicely when done with care and pleasure :D

Offtopic (Don't kill me, Mark!): I really wanted to live in Europe in days like those, then we can meet at some place, take a bunch of good food and beers and code all weekend long. I'll even buy the beer!
Use the FU!!!!!IN Google !

gto's Zeos Quick Start Guide

Te Amo Taís!
Zoran
Senior Boarder
Senior Boarder
Posts: 55
Joined: 07.05.2010, 22:32

Post by Zoran »

gto wrote: About DBC encoding, there are some DBs which define at column level, so we get a default database encoding, a table encoding, a column encoding, a client connection encoding AND resultset encoding override.
I beleive that most of those DBs will convert all data to client encoding before sending it to client (and in the oposite direction convet it from client encoding before storing it). Could this make your job simpler? Then, it does not matter if diferent tables and diferent columns in the base store data in diferent encodings. You can get all in the same encoding.

Or this is not so simple? I admit that I'm not quite sure if I understand what you actually do. :oops:
gto
Zeos Dev Team
Zeos Dev Team
Posts: 278
Joined: 11.11.2005, 18:35
Location: Porto Alegre / Brasil

Post by gto »

Zoran wrote: I beleive that most of those DBs will convert all data to client encoding before sending it to client (and in the oposite direction convet it from client encoding before storing it). Could this make your job simpler? Then, it does not matter if diferent tables and diferent columns in the base store data in diferent encodings. You can get all in the same encoding.

Or this is not so simple? I admit that I'm not quite sure if I understand what you actually do. :oops:
hum.. good point! This surely needs some testing and documentation exploring.

[]'s
Use the FU!!!!!IN Google !

gto's Zeos Quick Start Guide

Te Amo Taís!
ab
Fresh Boarder
Fresh Boarder
Posts: 16
Joined: 26.10.2010, 09:16
Location: France
Contact:

Post by ab »

Thanks for your support.

Perhaps it's a bit early to make changes to my code.
I'll work on it this Week End too... and I could be able to merge changes after all.

I'm entering into some "delicate" part... that is where drivers and ZDBC come in touch...

About encoding at database driver level, I'll try to make as generic as possible.
But the fact is that they are two kind of string: VARCHAR and NVARCHAR in most databases.
So we'll have to deal with NVARCHAR as Unicode (i.e. either UCS2 either UTF-8, both easy from UTF-8), but we'll have to convert our internal UTF-8 data into VARCHAR code page (which may be UTF-8 but could be any code page). For this second conversion, we'll need to send the right data with the right format to the external database driver... I guess it will depend on the database used, and on its level/feature...
Zoran
Senior Boarder
Senior Boarder
Posts: 55
Joined: 07.05.2010, 22:32

Post by Zoran »

gto wrote: hum.. good point! This surely needs some testing and documentation exploring.
I found something in documetation of MySQL and Firebird.

MySQL: the "SET NAMES xxx" command is sent to server just after connecting (look for "set names" on this page: http://dev.mysql.com/doc/refman/5.1/en/ ... ction.html).

Firebird or InterBase: for specifiing client charset the parameter "isc_dpb_lc_ctype" is specified when connecting to database (as documented in IB API guide -- the pdf can be found for download from here: http://www.firebirdsql.org/index.php?op=doc#category_10).

HOWEVER,
I beleive that it's already implemented somewhere in Zeos, so you might not need to look much in documentation.
In ZConnection.Properties you can put the line "codepage=UTF8". Of course, instead of UTF8, others encodings can be specified. When this line is there, then after ZConnection is connected, it tells to DB what the client encoding is, and DB server automatically converts to and from this encoding.

I tested it in Lazarus, with MySQL 5.0, Firebird 2.1 and Oracle. With MySQL an FB it worked well with Zeos 6.6.6. With Oracle it did not work well, but it works nice since I installed new Zeos from trunk (it seems that Oracle needed the patch from this topic, which is applied in new Zeos).

I don't know where in Zeos this functionality is applied (I tested the high level components only), but I beleive that the ZDBC layer probably has it, as what is actually sent to the server is certanly diferent for different DBs.
ab
Fresh Boarder
Fresh Boarder
Posts: 16
Joined: 26.10.2010, 09:16
Location: France
Contact:

Post by ab »

My todays concern is about type encoding in the framework.

I don't get why there are so many overlapping methods, like GetByte/GetShort/GetInt/GetLong, when it's only about retrieving one integer value?

Since I've unified GetString and GetUnicodeString for instance, to one GetUTF8 method which returns a RawUTF8, I'm just wondering why we shouldn't define just one GetInteger method (returning an Int64), letting the DB driver guessing how to manage it, according to the field definition.

My proposal is that we could rely on the same "basic" types as TVariant:
- boolean;
- Int64;
- Extended;
- RawUTF8;
- TDateTime;
- TByteDynArray (for BLOB).
Last edited by ab on 05.12.2010, 14:11, edited 3 times in total.
gto
Zeos Dev Team
Zeos Dev Team
Posts: 278
Joined: 11.11.2005, 18:35
Location: Porto Alegre / Brasil

Post by gto »

ab wrote:My todays concern is about type encoding in the framework.

I don't get why there are so much methods, like GetByte/GetShort/GetInt/GetLong, when it's only about retrieving one integer value?

Since I've unified GetString and GetUnicodeString for instance, to one GetUTF8 method which returns a RawUTF8, I'm just wondering why we shouldn't define just one GetInteger method (returning an Int64), letting the DB driver guessing how to manage it, according to the field definition.

My proposal is that we could rely on the same "basic" types as TVariant:
- boolean;
- Int64;
- Extended;
- RawUTF8;
- TDateTime;
- TByteDynArray (for BLOB).
Fully agreed! I'm facilitating it with my changes in ZDBCCache units, to be released somewhere between tomorrow (Sunday) and monday.
Use the FU!!!!!IN Google !

gto's Zeos Quick Start Guide

Te Amo Taís!
ab
Fresh Boarder
Fresh Boarder
Posts: 16
Joined: 26.10.2010, 09:16
Location: France
Contact:

Post by ab »

ab wrote: I don't get why there are so many overlapping methods, like GetByte/GetShort/GetInt/GetLong, when it's only about retrieving one integer value?
I guess it was to be compatible with JDBC... but this not at all compatible with the KISS principle I always try to follow.

So, if there is no definitive reason of any RED light from this forum, I'll try to shorten the list of methods to get, retrieve or update the fields content in the interfaces and classes definition.
cnliou
Zeos Dev Team
Zeos Dev Team
Posts: 31
Joined: 11.11.2005, 12:18

Post by cnliou »

Zoran wrote: I beleive that most of those DBs will convert all data to client encoding before sending it to client (and in the oposite direction convet it from client encoding before storing it). Could this make your job simpler? Then, it does not matter if diferent tables and diferent columns in the base store data in diferent encodings. You can get all in the same encoding.

Or this is not so simple? I admit that I'm not quite sure if I understand what you actually do. :oops:
This part is exactly what I need input, too.

Take PostgreSQL as an example, when the server side uses UTF8, it supports various client side character sets.

To tell PostgreSQL to use UTF8 for clients, we simply send the SQL text to it: "SET CLIENT_ENCODING TO utf8".

When "Codepage=UTF8" is set in property "Parameters" in TZConnection, it will trigger the same behavior, IIRC.

My concern is that problems may arise when clients use UNICODE for VCL:

VCL (UCS2) <--> PostgreSQL (UTF8)

since my understanding is that the bytes streams of UCS2 and UTF8 are completely different. Thus, I am in the impression that character sets conversion between UCS2 and UTF8 is essential for some, if not most, DBMS's when UNICODE is used for clients (i.e. VCL).

Corrections/enlightenments will be much appreciated.

Regards,
CN
Zoran
Senior Boarder
Senior Boarder
Posts: 55
Joined: 07.05.2010, 22:32

Post by Zoran »

cnliou wrote: When "Codepage=UTF8" is set in property "Parameters" in TZConnection, it will trigger the same behavior, IIRC.

My concern is that problems may arise when clients use UNICODE for VCL:

VCL (UCS2) <--> PostgreSQL (UTF8)
As far as I understand, when the line "codepage=XXX" is in TZConnection.Parameters property, Zeos implements it so that it tells to DB server that the client encoding is XXX.

Therefore, if your DB uses UTF8 encoding and the client uses UTF16, then you should put "codepage=UTF16" (or maybe "codepage=UCS2"?). Then Zeos will tell the server to convert data to and from UTF16 (UCS2) encoding.

Actually, as a DB client, your application should not care what server encoding is, you should just tell the server what the client encoding is. That is what Zeos does with "codepage=XXX" in TZConnection.Parameters.
cnliou wrote:
since my understanding is that the bytes streams of UCS2 and UTF8 are completely different. Thus, I am in the impression that character sets conversion between UCS2 and UTF8 is essential for some, if not most, DBMS's when UNICODE is used for clients (i.e. VCL).
These are two UNICODE encodings, one uses two bytes for one character, the other uses variable length for different characters.

The most important is that they both actually represent same - UNICODE code points. Same set of characters, actually. Therefore, conversion between these two is 1-1, straightforward and always possible. If your DB server uses either of these two to store data, you can be sure that every UNICODE character has its unique represantation. So no need to worry!
cnliou
Zeos Dev Team
Zeos Dev Team
Posts: 31
Joined: 11.11.2005, 12:18

Post by cnliou »

Zoran wrote: Therefore, if your DB uses UTF8 encoding and the client uses UTF16, then you should put "codepage=UTF16" (or maybe "codepage=UCS2"?). Then Zeos will tell the server to convert data to and from UTF16 (UCS2) encoding.
The problem happens when the DBMS like PostgreSQL does not support UCS2 but only UTF8 for client character set encoding:

http://www.postgresql.org/docs/9.0/stat ... ibyte.html
Zoran wrote: Actually, as a DB client, your application should not care what server encoding is, you should just tell the server what the client encoding is. That is what Zeos does with "codepage=XXX" in TZConnection.Parameters.

These are two UNICODE encodings, one uses two bytes for one character, the other uses variable length for different characters.

The most important is that they both actually represent same - UNICODE code points. Same set of characters, actually. Therefore, conversion between these two is 1-1, straightforward and always possible. If your DB server uses either of these two to store data, you can be sure that every UNICODE character has its unique represantation. So no need to worry!
I believe your explanation is true if clients do not use UCS2. However, I am afraid ZEOSLIB must do the dirty job - bidirectional conversion between UCS2 and UTF8 - when clients want to use UCS2 while the DBMS's do not support UCS2 but only UTF8.

Regards,

CN
ab
Fresh Boarder
Fresh Boarder
Posts: 16
Joined: 26.10.2010, 09:16
Location: France
Contact:

Post by ab »

Here is how I intend to implement Unicode in Delphi 6 up to XE:

- all "internal" data is UTF-8 encoded, in a new type named RawUTF8 (AnsiString with code page for UTF-8);
- you've to call our custom UTF8ToString() or StringToUTF8() to convert to/from VCL string type - this will work for Delphi 6 up to XE and is very fast (avoid using Windows API most of the time);
- if you're working with Delphi 2009 and up, you can just use a RawUTF8 string, and let the compiler make an implicit conversion to UCS2 string.
cnliou
Zeos Dev Team
Zeos Dev Team
Posts: 31
Joined: 11.11.2005, 12:18

Post by cnliou »

ab wrote: - you've to call our custom UTF8ToString() or StringToUTF8() to convert to/from VCL string type - this will work for Delphi 6 up to XE and is very fast (avoid using Windows API most of the time);
Please help me get your point.

Zeoslib would not be quite usable if you mean the higher level applications are responsible for manual calls to

- UTF8ToString() after retrieving UTF8 strings from DBMS

- StringToUTF8() before sending UCS2 strings to DBMS.

The usable library should be implemented so that the conversions are automatically managed by zeoslib without intervene from higher levels.
ab wrote: - if you're working with Delphi 2009 and up, you can just use a RawUTF8 string, and let the compiler make an implicit conversion to UCS2 string.
It will be a great help if a scenario is provided. Please correct me when appropriate. We assume these:

- DBMS supports UTF8 as client side charset encoding.

- All the following selected database string columns are linked to VCL TDBEdit.

- Applications do _not_ call conversion functions.

Code: Select all

ZQuery1.SQL.Text='SELECT string_column FROM t';
ZQuer1.Open;
Question: TDBEdit.Text now automagically holds UCS2 string, doesn't it?

Now, I change the value in TDBEdit using my keyboard. Then I call this:

Code: Select all

ZQuery1.Post;
Will TDBEdit.Text be converted (automatically by zeoslib) to UTF8 before it is sent to DBMS?
ab
Fresh Boarder
Fresh Boarder
Posts: 16
Joined: 26.10.2010, 09:16
Location: France
Contact:

Post by ab »

At the VCL component level (e.g. TDataSet or TQuery Zeos versions), the type for strings will be the plain Delphi string type.
For instance, SQL.Text will expect string type, i.e. UCS2 since Delphi 2009.

The UTF-8 encoding will be used only at lower level, i.e. at ZDBC level.
All the conversions will take place when calling the ZDBC interfaces, automatically by the ZEOS units.

For high-level users only playing with VCL components, you won't see any difference. You'll use Delphi string values, just as usual.
Locked