Hello Michael,
I am sorry that my answer is somewhat lengthy. But I see no other way how to say everything and be precise at the same time.
EgonHugeist wrote:We need a WideToAnsi-String translation here or we lost some String-Data. On executing UTF8Encode(AString) the result is Ansi-compatible. But the AnsiCodepage turns from your OS-specific CodePage to 6500 or 6501 (UTF8-CodePage).
It seems that we both use some different kind of terminology. For me the term character set conversion means a transformation for transferring a character from one representation style to another representation style where the number assigned to the character may change or where the character may be not transferrable at all. So changing from CP850 to WIN1252 is a character set conversion because the ä changes from number $84 in CP850 to $E4 in WIN1252. The "┴" from CP850 is not transferrable at all. The same applies for a conversion WIN1252 <-> Unicode. Normally you need some kind of a translation table for being able to do these.
When it comes to Unicode, the translation from UTF16/UCS16 to UTF8 is a change in the coding style and no character set conversion, because it will always happen without a loss of data and the characters remain at the same position in the character set tables. The ä will always be $E4. Just the way, the $E4 is encoded changes. For translating between UTF16 (Delphi, Windows) and UTF8 (Firebird) you don't need translation tables - you just need a quite simple algorithm for reencoding numbers - much like it is necessary for converting numbers between big endian and little endian.
So for me the use of UTF8Encode is not a character set translation but just a change in the encoding style of a unicode String.
EgonHugeist wrote:Open Unit ZDbcInterbase6Utils.Pas.
Then goto:
procedure TZParamsSQLDA.EncodeString(Code: Smallint; const Index:
[...]
Look carefully what happen's here?
So, using this, I had a look at TZParamsSQLDA.EncodeString. This routine makes some assumptions that may be true or may not be true.
When it finds a Delphi or Pascal that uses the ansistring for its string type it just copies the incoming data to the Firebird XSQLDA-Structure and hopes that the encoding is correct for what the user specified in the connection settings via isc_dpb_lc_ctype.
And it seems that somebody has transported this way of assuming to the Delphis that use a UnicodeString for their String type. Now it assumes that the user has set the isc_dpb_lc_ctype parameter to UTF8. It just uses UTF8Encode to convert from the UTF16 representation, that is used by Delphi, to the UTF8 representation that Firebird uses, no matter what the connection character set really is.
So this all comes to some questions as to what the Firebird DBC driver can expect from the user and what the user can expect from the DBC driver.
With Zeos 6 it was the users responsibility to tell the DBC driver what encoding he needs for Delphi to work (WIN1252/BIG5/...). UTF8 normally was no valid choice for Delphi. It was a valid choice for Lazarus, as Lazarus uses UTF8. This also can be a codepage that is different from the database codepage because Firebird will do the translation.
What is the users responsibility with Zeos 7? With a Delphi that primarily uses AnsiStrings it will remain the same responsibility, because Zeos can not detect the Character set to be used reliably.
But what is the responsibility of a user with a unicode supporting Delphi?
-> Is the user supposed to use CHARSET=UTF8 to tell the DBC driver that unicode support is required for correct functionality?
-> Or is the DBC driver required to detect what kind of character set the user specified and to use UTF8Encode/UTF8Decode when it is UTF8 and to use something like WideToAnsi/AnsiToWide when it is something else?
-> Or should the DBC driver enforce the use of Unicode (UTF8) when there is a unicode Delphi and not allow anything else?
EgonHugeist wrote:Would you be so kind and test this?
So for your experiment: I did not do it. But I did something like it before. The outcome of the experiment very much depends on what value you set for the CHARSET parameter.
With CHARSET=WIN1252 writing will not work correctly. I can only assume, this is because TZParamsSQLDA.EncodeString silently encodes the target String as UTF8 whereas the database expects it to be in your selected codepage (WIN1252). Reading will work because the strings will be delivered in the correct codepage by the database server and be converted to UCS16 by Delphi.
With CHARSET=UTF8 writing will work properly. This is because now TZParamsSQLDA.EncodeString deliveres exactly the right encoding that the database expects. But reading will not work because the UTF8-encoded strings are just advertised as being ANSI and are not converted to the correct ANSI codepage.
When you apply my patch, reading will work too, because then the Strings will be treated as being Unicode. You get a situation where reading and writing works, regardless of what the database encoding for VARCHARs is.
For who does what codepage conversion:
Right now there are one or two conversions happening, when writing and you need full Unicode support:
1) Delphi to Firebird Client = Unicode(UCS16)<->Unicode(UTF8) -> In My book this is no Codepage Conversion because it is just a change in how the numbers are encoded. In your book this is a codepage Conversion. This one is necessary, when UTF8 is the only Unicode representation that the database understands, as it is with firebird.
2) Firebird Client to Firebird Engine = Unicode(UTF8)<->Database Encoding (UTF8/WIN1252/BIG5) -> This one is necessary for Firebird to be able to support clients that request different character set encodings.
For the use of UTF8Encode/UTF8Decode: You simply need to use that when the only Unicode encoding that your database understands is UTF8. Otherwise you will never be able to use the databases unicode functionality.
And also I think that the DBC driver is the right point to use UTF8Encode/UTF8Decode as only the driver knows what flavors of unicode are supported by the underlying database and what conversions are needed.
EgonHugeist wrote:But what happens if you execute an non-UTF8-Prepared Query directly? The Proplem should be the same. Please correct me..
Hmmm - I don't know, but then I don't need to. I don't know what MySQL does. With Firebird you specify the character set, you want to use when you connect and can not change it afterwards.
So - again I am sorry for this lengthy text. But maybe if you can answer my questions of what is to be the expected behaviour for the DBC driver, then maybe I can modify it to behave the way it is supposed to.
Best regards,
Jan