Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Fr0sT · Post by **Fr0sT** » 15.02.2019, 23:16

From what you said, you have your "boolean" char(1) without any explicit codepage. Hence it's utf8 too. One Utf8 char could occupy up to 4 bytes. Even in FB itself the maximum length of UTF8 VARCHAR is 8k chars while ANSI encoded field could take 32k. So the field size is correct in terms of bytes. I havent inspected how RTL converts between byte size and char length of a field though. And another question is whether this x4 calculation considers a field's codepage.

duzenko wrote: The char(1) is one of the officially recommended alternative for a boolean field in Furebird 2.5 http://www.firebirdfaq.org/faq12/

Nope. It is one of available alternatives while smallint domain is recommended.

duzenko · Post by **duzenko** » 18.02.2019, 10:54

Fr0sT wrote:From what you said, you have your "boolean" char(1) without any explicit codepage. Hence it's utf8 too. One Utf8 char could occupy up to 4 bytes. Even in FB itself the maximum length of UTF8 VARCHAR is 8k chars while ANSI encoded field could take 32k. So the field size is correct in terms of bytes. I havent inspected how RTL converts between byte size and char length of a field though. And another question is whether this x4 calculation considers a field's codepage.
duzenko wrote: The char(1) is one of the officially recommended alternative for a boolean field in Furebird 2.5 http://www.firebirdfaq.org/faq12/
Nope. It is one of available alternatives while smallint domain is recommended.

Again, this is not about boolean's.
Consider a varchar(10) field. In Delphi I consume it with this chain: TZConnection - TZTable/TZQuery - TDatasource - TDBEdit,
The length of the field gets retrieved from the DB and ends up as TDBEdit.MaxLength (not a public field but anyway). This way user is not able to enter more than 10 characters in the edit.
It worked like that for decades in both Delphi standard data components and Zeos up to 7.1.
7.2 broke this convention by returning 4x column length as the field size (for Utf-8 databases). Now the user is able to enter 40 characters in the TDBEdit.
It does not matter how many bytes that text contains in the memory - the firebird just won't accept anything larger than 10 characters for that column.
I can only assume that you were trying to allow users to use windows non-unicode edit controls to enter e.g. Japanese multi-byte characters. Sorry, but that's not how you should have approached it. You had to use a TWideStringField for that.

marsupilami · Post by **marsupilami** » 20.02.2019, 12:09

duzenko wrote:
Too bad - the way you write here, I already did take offense.
Why, was it your personal decision to quadruple the advertised field size?

No - it wasn't my decidion but that has nothing to do with it. You come to this forum and ask for help. First off you are just throwing pieces of information at us that are by far not enough to diagnose what is happening on your side. When asked to provide the necessary information you insult me by stating that this information is not necessary for the diagnosis. If you know that much why ask for help here anyway? Then we point yout that the design desisions in your project have to do with the problem. You jump in our faces telling us that this is all our fault, making demands, trying to tell as what we have to do in our project and finally insulting me to be a liar: "[...]but also you defend it as a 'fix' while to me it looks you just found out about it.". You are free to ask questions, make suggestions and voice opinions. Also I suggest you treat us like we know what we do and keep in mind that we probably have reasons to do what we do, even if you don't like it. Also plese keep in mind that we are not selling a product but we share our work with you and support you in our spare time.
I am not going to discuss this matter any further. If you want help please provide details in the future on what you did and how your environment is. The minimun set of informations being your compiler type and version, the version of Zeos you use and the database you use. All these things have great impact on many technical details even though you might not realize it because on some ocassions Zeos just might be working different from what oyu expect it to work.

By the things you write I get the impression that you miss some serious points:

TStringField is totally broken for any character set that uses more than one byte per charachter - Big5 for example or UTF8. This can lead to Buffer overruns on systems that use that kind of character set on ANSI Delphi or Lazarus. Zeos tries to accomodate for this by modifying TStringField.Size. You may not like this but that's the way it is.
TWideStringField also has a design flaw. It assumes that one character is two bytes in Size. But this only is true for characters and symbols on the Basic Multilingual plane (BMP). Because of this it is by no means a proper replacement for UTF8 because UTF8 (as well as UTF16) can encode the whole Unicode standard and not only the BMP.
The database character set has absolutely nothing to do with the connection character set. Firebird will happily convert data to and from the client character set and store it in UTF8 in the database. This is the recommended way to go in ANSI environments like Delphi 2007.
Just to add one more point to this character madness: Databases treat Unicode data and MBCS very differently: Firebird always assumes VARCHARs to be specified in an amount of characters. Microsoft SQL Server specifies VARCHAR to always be in an amount of bytes. Oracle has options to specify varchars as an amount of bytes to be stored as well as an amount of characters to be stored. Also NVARCHARs are trated differently by MS SQL and Oracle - to make things even worse.

So getting things right for every database Zeos supports is not as simple as it may seem on a first glance.

I created a wiki entry on SopurceForge to document current behaviour of Zeos: Zeos and Character Sets. This behaviour will be kept for Zeos 7.2. We might change it again for Zeos 7.3.

duzenko wrote:Well, my problem is the above mentioned breaking change was not only mentioned NOWHERE in the docs or the forum but also you defend it as a 'fix' while to me it looks you just found out about it. Am I wrong and it's a common knowledge? Did anyone anywhere on this forum or elsewhere bring it up yet? Do you think that being upset about this is unreasonable?

Being upset is allright. But insulting people and doing demands on people who do this job in their spare time and only request you to be polite because they also have their burdens in their life is unreasonable.

duzenko wrote:
We did the decision that we support users pushing UTF8 directly to the database using TStringField in ANSI Delphis.
ANSI Delphis have TWideStringField exactly for that purpose. That's where you were wrong at the start and everything else is consequences.

That might be your opinion. We have a different one. Live with it.

duzenko wrote:
In this use case the user is responsible for making sure that not more than the allowed amount of characters gets pushed into the TString Field.
And the field length checks in TStringField flush down the drain. It's only something Delphi programmers relied on for decades now - no big deal.

Yes - because we expect developers who request thg database to send _anything_ that differs from the system code page on an ANSI Delphi to know exactly what they are doing.

duzenko wrote:
There is no such support as using bools as Firbird Char(1) fields. You relied on behaviour that worked but was not intentionally working. At least from a Zeos point of view.
No support for meaningful TField.Size values as well?

Not if you request the database to send UTF8.

duzenko wrote:
Even if Zeos would do some kind of check here, it wouldn't be a truncation but raising an exception that there are too many characters stored in the field.
You don't have to do any checks at all, just return the actual field length. And surely not to try to convert WideString to Utf-8 or vice versa.

The whole point was: If you get TSringField.Size extended and Zeos would do the check wether the amount of characters exceeds the amount of characters that the database can store it wouldn't be a silent truncation (as it was in Zeos 7.1) but an exception. Basically the same Exception that Firebird raises for you.

duzenko wrote:
No - it wasn't. Do we need your approval before we make changes to the Zeos code? Do you want to step up and support or do Zeos development? If you don't want to do any of that then why do you demand that we discuss the development of Zeos with you before we do changes?
I want to read on the reasons why char(1) has Size=1 in 7.1 and 4 in 7.2. If there were any. The breaking changes need to be explained - am I asking too much?
I have done small contributions in the past and I don't see what it has to do with anything.

If you were a team member you could argue that design decisions for Zeos need to be discussed with you too. For the reason why CHAR(1) results in TField.Size = 4 see the above mentioned Wiki page. As for explaining breaking changes: We usually try to document that kind of things. Sometimes changes that break functionality either are not seen to break functionality or simply get overlooked for documentation. People make erorrs. But that is no reason to be rude. If you want to read documentation I suggest you ask questions and document answers. You may not realize it but writing documentation on a complex project like Zeos is time consuming. Getting the above mentioned Wiki page right took me about one to two hours.

duzenko wrote:
You still need to truncate to real field length in character, but do that on a WideString rather than AnsiString
TStringField already does that (did until you broke it in 7.2). No way I am running around the source code manually truncating each and every field I'm sending to the database. And surely it has nothing to do with WideStrings unless you confuse AnsiStrings with utf8's.

I am not going to comment on this because that text about conversions was from your own post: http://zeoslib.sourceforge.net/viewtopi ... 13#p110413

duzenko wrote:
Again: Zeos does no intentional truncation. Delphi is doing the truncation in this example because it happens on this line of your example code:
Code: Select all
ZTable1['some_bool'] := Random(2) = 1;
During this assignment Delphi decides to truncate the string value of "False" to Fals or to F - depending on TStringField.Size.
God forbid you started truncating stuff in Zeos on top on what you already messed up with field Size.

I am not going to comment on this.

duzenko wrote:
Zeos (correctly) just sends the value that gets stored in the StringField to the database that in turn correctly raises an exception.
Oh yeah? And where does the TStringField.Size come from?

See the above mentioned Wiki page.

duzenko wrote:
marsupilami wrote: Soo - to bring this to an end:

To me it seems that your connection character set is UTF8 while you use an ANSI Delphi. Why? This simply doesn't make sense. You might want to consider using the correct ANSI character set for your database connection. ZConnection.ClientCodepage='WIN1252' for example.

Because the DB is used by other applications but mine and some of them require the DB to be UTF-8.

If that is your whole reason to use UTF8 as the client character set, you might want to read up on the difference between the database character set and the connection character set. Firebird will do all necessary conversions for you if you use it in the right way. This requires a correct database design obviously where character set none doesn't get used. But Firebird might hit you in the face with an error if it cannot convert the data you request to your CodePage. And that might be the reason why you really use UTF8 - because somebody didn't like these errors and decided that it is ok if Zeos does these conversions using Windows funtions and looeses some datea on the way. To get that behaviour back use the current SVN version of Zeos and set AutoEncodeStrings to true for your application. The current SVN version also has a fix that gets TStringField.Size back to the expected size if you use ControlsCodePage=cGET_ACP.

duzenko wrote:
[*]Another option might be to use ClientCodepage=UTF8 and ControlsCodePage=cCP_UTF16. On the SVN version of Zeos 7.2 this will make Zeos generate TWideStingField instead of TStringField with the correct number of characters. I am not 100% sure if the release version Zeos 7.2.4 will behave in the same way. Watch out for type casts like TStringField(DataSet.FieldByName('somefield')) in your code in that case because they might be incorrect afterwards and cause access violations then.[/list]
That changes exactly nothing. It's still Size=4 for char(1) with the inevitable runtime errors.

Honestly - please provide a test application and a database script. If you don't start supplying enough information to know exactly what you do, we will have to work like this. I still don't know wether you got the expected TWideStringField - which shouldn't get a Size of 4 under any circumstances in this case - or if you still get a TStringField. Also I still don't know wether you did the test on Zeos 7.2.4 or on the SVN version of Zeos 7.2.

duzenko wrote:
Soo - to bring this to an end:

Consider one of the following

Ban my forum account cause I'm not going to agree and call this bug an 'improvement'

Agree to optionally return the ACTUAL field size

Agree to optionally use TWideStringField's on unicode databases instead of TStringField

See - there are a lot of other options for this. One of them is to simply ignore future posts by you. Another option is that you finally accept that we have our reasons for doing what we do. I don't need to ban you and I don't need to agree with you on anything.

Best regards,

Jan

duzenko · Post by **duzenko** » 21.02.2019, 15:18

marsupilami wrote: Honestly - please provide a test application and a database script. If you don't start supplying enough information to know exactly what you do, we will have to work like this.

Source code and the DB script
https://drive.google.com/file/d/1EeMKTt ... sp=sharing

1. Form caption
Expected behavior:

TZConnection.CCP = ACP => TStringField size = 1
TZConnection.CCP = UTF8 => no opinion, happy with anything
TZConnection.CCP = UTF16 => TWideString size = 1

Observed behavior

TZConnection.CCP = ACP => TStringField size = 4 (very bad)
TZConnection.CCP = UTF8 => TStringField size = 4 (automatic conversion? don't really care in my case)
TZConnection.CCP = UTF16 => TWideString size = 1 (as expected, and I want ACP to return size=1 just as well)

2. Using the DB Navigator
Expected behavior: you can append a record, enter up to 10 characters to the edit, toggle the check box and save
Observed behavior: UTF8 or ACP you can enter up to 40 characters and if text is longer than 10 characters or toggled the check box then the attempt to save will crash

3. The test button
Expected behavior: a record gets inserted
Observed behavior (UTF8 or ACP): crash

1, 2, 3 come in from simple to more complicated. For now I'm only interested in (1) because the other two are just consequences. The other two are what I experience in my program.

I need to support both no-codepage and UTF8 firebird databases of the same structure. Generally I don't need Unicode support but I guess it's automatic anyway with TZConnection.CCP = UTF16. FWIW AutoEncode does nothing in my tests. SVN revision: 5431.

Question: I expect to see the same field size for both TZConnection.CCP ACP and UTF16 - what happens? The only difference should be string/widestring & TStringField/TWideStringField.

Again, I just want Zeos to have an option to return the actual size of the field. I can see that it already works like that with CCP=UTF16. Now I need to understand if it's an overlook on your side and you're soon about to 'fix' it like you 'fixed' the ACP? And if not, why the difference.

marsupilami · Post by **marsupilami** » 21.02.2019, 21:45

Hello duzenko,

I will take a look at this next week. Unfortunately I will not have much time for Zeos this week or weekend.
Have a nice weekend,

Jan

duzenko · Post by **duzenko** » 02.03.2019, 09:15

Post by **EgonHugeist** » 07.03.2019, 07:46

@duzenko
Please keep that noise down.

We're trying to help you.

Banning is not an option. I do understand your point of view.

duzenko wrote:
21.02.2019, 14:18
SVN revision: 5431

duzenko wrote:Bump

Didn't i resolve this in SVN several weeks ago? The SVN merge to 7.2 (R5492) did happen on 2019-02-19. So your report was outdated at this time.
I don't have time for the forum at the moment, sorry.
Plz update your revision and report your findings again.

duzenko · Post by **duzenko** » 11.03.2019, 16:43

EgonHugeist wrote:@duzenko
Please keep that noise down. We're trying to help you.

Sorry

Banning is not an option. I do understand your point of view.

Thank you for your time.

Didn't i resolve this in SVN several weeks ago? The SVN merge to 7.2 (R5492) did happen on 2019-02-19. So your report was outdated at this time.
I don't have time for the forum at the moment, sorry.
Plz update your revision and report your findings again.

Works like a charm.

So, ummm. What's the plan for Zeos? Can I expect varchar(1) to keep the data size 1 in future versions for ACP/UTF16? Or it's going to be 'fixed'?

marsupilami · Post by **marsupilami** » 11.03.2019, 17:21

Hello Duzenko,

I am sorry for my late reply. Family issues and my job have kept me quite busy, so no time was left for Zeos.

duzenko wrote: So, ummm. What's the plan for Zeos? Can I expect varchar(1) to keep the data size 1 in future versions for ACP/UTF16? Or it's going to be 'fixed'?

It doesn't even work like that with the current implementation. The current implementation asks Windows what the maximum character size is on the current codepage and uses that as a multiplicator for size. So if your code page has a maximum character size of 1 byte, varchar(1) will have a size of 1. If your code page has a maximum character size of 2 bytes, varchar(1) will have a size of 2. For Zeos 7.2 we will keep that implementation. Zeos 7.3 most probably will see some kind of change around these issues but it is not yet decided which changes it will see.

Best regards,

Jan

duzenko · Post by **duzenko** » 19.03.2019, 12:56

marsupilami wrote:Hello Duzenko,

I am sorry for my late reply. Family issues and my job have kept me quite busy, so no time was left for Zeos.

duzenko wrote: So, ummm. What's the plan for Zeos? Can I expect varchar(1) to keep the data size 1 in future versions for ACP/UTF16? Or it's going to be 'fixed'?
It doesn't even work like that with the current implementation. The current implementation asks Windows what the maximum character size is on the current codepage and uses that as a multiplicator for size. So if your code page has a maximum character size of 1 byte, varchar(1) will have a size of 1. If your code page has a maximum character size of 2 bytes, varchar(1) will have a size of 2. For Zeos 7.2 we will keep that implementation. Zeos 7.3 most probably will see some kind of change around these issues but it is not yet decided which changes it will see.

Best regards,

Jan

There is a critical difference between current 7.2 svn and "stable 7.2"
Stable 7.2 returns 4 chars for varchar(1) in ACP mode while svn 7.2 returns 1.

marsupilami · Post by **marsupilami** » 21.03.2019, 20:52

duzenko wrote: There is a critical difference between current 7.2 svn and "stable 7.2"
Stable 7.2 returns 4 chars for varchar(1) in ACP mode while svn 7.2 returns 1.

Read my posts -> this is only true if you select a character set for your database connection that is not 1 byte per character. Zeos 7.2 will return a size of 1 if you select ACP mode and the connection character set is WIN1252 for example. Your usage of UTF8 as the connection character set simply doesn't make sense for most use cases in an ANSI Delphi world. Most users will use the correct ANSI character set for their country as the connection character set even if their database is UTF8.
Also - as you found out - this is - at least in part - a bug that got fixed.

Zeoslib Portal

Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?

Re: Bug in TZDefaultIdentifierConvertor.GetIdentifierCase?