character transliteration for blobs

Forum related to Firebird

Moderators: gto, cipto_kh, EgonHugeist

marsupilami
Platinum Boarder
Platinum Boarder
Posts: 1918
Joined: 17.01.2011, 14:17

character transliteration for blobs

Post by marsupilami »

Hello,

I try to access a firebird database from Lazarus where some fields contain umlauts. The default character set of the database is WIN1252. To get the transliteration to UTF8, which is used by Lazarus, done correctly by firebird, I add the line "CODEPAGE=UTF8" to ZConnection.Properties.
From the documentation, I have found so far, I would expext firebird to do the transliteration for VARCHARs and for BLOBs of sub type text.
When accessing the database, the transliteration to UTF8 works as expected for VARCHARS but seems to not work for BLOBs.
Is anybody having any ideas?

Best regards,

Jan
seawolf
Zeos Dev Team *
Zeos Dev Team *
Posts: 385
Joined: 04.06.2008, 19:50
Contact:

Post by seawolf »

Basically, a text blob is considered an ascii stream rather than a Unicode stream
User avatar
mdaems
Zeos Project Manager
Zeos Project Manager
Posts: 2766
Joined: 20.09.2005, 15:28
Location: Brussels, Belgium
Contact:

Post by mdaems »

Basically, a text blob is considered an ascii stream rather than a Unicode stream
I'd rather say a binary stream(which means an uninterpreted series of data bytes). I suppose saying 'ascii' already implies it's kind of text, which a blob is not.

Mark
Image
marsupilami
Platinum Boarder
Platinum Boarder
Posts: 1918
Joined: 17.01.2011, 14:17

Post by marsupilami »

Hello seawolf and mdaems,

thanks for your answer. According to the Interbase 6 documentation you can ask Firebird to transliterate blobs of subtype text when you open them or create them by giving the needed information to isc_open_blob2 or isc_create_blob2. The Zeos Interbase / Firebird driver doesn't do this (yet). Currently I try to make the necessary changes but have a problem where I don't know how to solve it in a good way:
I determine the client character set ID in TZInterbase6Connection but don't know how to access that information from TZResultSQLDA where the calls to ReadBlobBufer are done. Do you have any suggestions?

Jan
seawolf
Zeos Dev Team *
Zeos Dev Team *
Posts: 385
Joined: 04.06.2008, 19:50
Contact:

Post by seawolf »

Reading the manual, there 2 ways to do what you need .. Which method have you chosen? Do you let the Interbase/Firebird manipulate the text passed?
marsupilami
Platinum Boarder
Platinum Boarder
Posts: 1918
Joined: 17.01.2011, 14:17

Post by marsupilami »

You read the manual more carful than I did. I only found the way to create a Blob Parameter Buffer (BPB) and give that to isc_open_blob2 and isc_create_blob2. So far I have managed to create a BPB for all cases where a blob is read but have trouble when it comes to writing because there seems to be no good way to get the needed data. Let me explain:

For creating the BPB, I need to know the character set, that is defined for the blob in its meta data. The way i went here, is to use isc_blob_lookup_desc, which needs the table name and the field name as parameters. In all cases where a blob is to be read, it is possible to determine the blobs field name and table name and thereby get the required information about the character set. I assume that in all cases this is possible because isc_dsql_describe delivers this information to zeos.

But when the blob is to be written, this information is not accessible. For the parameters, which go into a query, the function isc_dsql_describe_bind is called, to get the type information for the parameters from firebird. But this function leaves the field names and the table names empty and only returns type information. So all the caller knows know is that there is a blob of subtype text. But there is no character set information there, which makes it impossible to generate the right BPB. The only solution, that I have found so far: Have your own sql parser that allows you to determine the field name and the table name and thereby make a call to isc_blob_lookup_desc possible. The question here: Can the IZStatementAnalyser possibly do that?
But I am not sure if it makes sense to follow this path: Firebird, starting from version 2.1, seems to do the transliteration automatically if you use the most current ODS version (which I did not) and do not supply a BPB. So any solution would just be there for supporting older versions of firebird and only if a transliteration is needed there.

So the bottom line is: For now I am not able to generate a suitable BPB for writing to a blob. If there is a way to get the necessary information (table name and field name) which is more easy than writing my own SQL parser and if you think having automatic transliteration for firebird versions older than 2.1 is a good feature I am willing to continue this.

In any case I have attached a file that contains the changes I made so far. The changes in the plain folder are simply enhancing the plain drivers to be able to call isc_blob_lookup_desc. The changes to the dbc folder are the actual work done so far.
You do not have the required permissions to view the files attached to this post.
User avatar
mdaems
Zeos Project Manager
Zeos Project Manager
Posts: 2766
Joined: 20.09.2005, 15:28
Location: Brussels, Belgium
Contact:

Post by mdaems »

seawolf,

What's your opinion?
I have no idea what the expected results are.
I see the patch is made against the 6.6 code. I don't think that's a very good idea. Plaindriver code changed a lot in 7.X.

Concerning parsing the sql statement to obtain table and column names: this is done in the dbc layer already, unless I'm terribly wrong. (See TZAbstractResultSetMetadata.LoadColumns)

Mark
Image
marsupilami
Platinum Boarder
Platinum Boarder
Posts: 1918
Joined: 17.01.2011, 14:17

Post by marsupilami »

Hello Mark,

the code is against the 6.6 code because I wanted it to work in Delphi 6 and Lazarus. If a solution for the general problem can be found I am willing to port the necessary changes to 7.0.

The aim was to enable Zeos to automatically request firebird to do character set conversion for text blob fields from their encoding (for example WIN1252) to the connection encoding (for example UTF8) and vice versa. For this to work you need to tell firebird which conversion to do (WIN1252 -> UTF-8 for example) every time you create and write a text blob or you read a text blob. For this to work you have to find out which character set was declared in the blob metadata, which you can query if you know the table name and field name.

When reading blob data this is no problem because FB returns the field name and the table name when the statement is prepared.

Upon updating a blob field zeos generates a statement like
"update THETABLE set BLOBFIELD = ? where IDFIELD = ?"
When this statement is prepared, FB returns information about the Data Type of the parameters, but no table name and field name. So I get to know that the first parameter is supposed to be a blob ID and that it is of subtype text. But I have no idea how to find out which character set is declared in the metadata of the blob field because I have no field name and table name to work with.

Fortunately Firebird automatically does the conversion from version 2.1 on. So this code is not necessary anymore. It would only be necessary for Firebird 1.0, 1.5 and 2.0 and for the Interbase 6 and 5 drivers.
User avatar
mdaems
Zeos Project Manager
Zeos Project Manager
Posts: 2766
Joined: 20.09.2005, 15:28
Location: Brussels, Belgium
Contact:

Post by mdaems »

Fortunately Firebird automatically does the conversion from version 2.1 on. So this code is not necessary anymore. It would only be necessary for Firebird 1.0, 1.5 and 2.0 and for the Interbase 6 and 5 drivers.
Which makes it extra difficult to implement because you'd have to test for the real server or dll version and not the driver chosen in the connection.

Concerning the statement generated for the update : I believe it's generated in the ZDbcGenericResolver unit which uses resultsetmetadata, if I remember correctly.
You could try if you can create an ib/fb resolver (check for TZMySQLCachedResolver for an example) which knows the target table and field names.

Mark
Image
marsupilami
Platinum Boarder
Platinum Boarder
Posts: 1918
Joined: 17.01.2011, 14:17

Post by marsupilami »

I am sorry for the delay. I was really busy at work...

I think testing for the Server Version would not be that hard. I am sure there is some API call to get the version.

I am sure that it is possible to gather the required information when operating from a result set. But I think it is also valid for a user to set the SQL-Property of TZQuery to something like

insert into sometable (IdCol, BlobCol) values (:ID, :BLOBTEXT)

And in that situation there is no place to get the information from, other than from the SQL-String itself?
delphidreamer
Fresh Boarder
Fresh Boarder
Posts: 6
Joined: 26.09.2011, 12:27

ZEOS in osFinancials/Turbocash (opensouce ledger software)

Post by delphidreamer »

I have also got this problem but just with plain fields.
The database is FFS_UNICODE
and the charset in connection is also set right.

I have downloaded several sources
But the one that works for me is the latest test realease with some minor djustments. I get a error on that sort after post thing that uses a bookmark.
and field length was still field size and not char length
I would love to support full unicode and i have always enjoined using ZEOS in my projects. With the current code i cannot use it in that way.
I think writing a simple solution to get that column info would not be that hard right? ITs all simple update set and insert into right ?
If you cannot get the info you could use default coalition right ?

Does anyone have this problem to ? (cant enter ÄË it makes it utf8
Ä=Ä)
If i get this right i could have the program hold English Chinees and arabic in one database and it makes the potential user base a lot bigger

Im even willing to donate as getting a commercial package would also bring cost so if i can find the right person im sure we can get to a agreement. You can find me on skype (delphidreamer from amsterdam)
marsupilami
Platinum Boarder
Platinum Boarder
Posts: 1918
Joined: 17.01.2011, 14:17

Post by marsupilami »

Hello delphidreamer,

your problem seems to be different. For (VAR)CHAR fields Firebird does the transliteration automatically, if you set the connection character set correctly. If you use Firebird 2.1 with its new ODS, you will also get an automated transliteration for BLOBs of sub type text (= sub type 1).

As an example: I use Delphi 6 on a german Windows, so I can only use ANSI character sets. In my Projects I add the line "CODEPAGE=WIN1252" to the TZConnection object for the transliteration to work. If the project was used in China i probably would have to change that to CODEPAGE=BIG5 or whatever Windows uses in China.

So maybe it is possible to help you if you tell us which version of Delphi you use and what the language of your Windows is.

For my Problem: It is not sufficient to use the database default charachter set (or the default collation) to make a guess which transliteration I have to request when the server version is below 2.1. This is because you can specify a different character set for each BLOB column. So one may be BIG5 and another may be WIN1250 and the database debault may be UTF8...

Jan
delphidreamer
Fresh Boarder
Fresh Boarder
Posts: 6
Joined: 26.09.2011, 12:27

Post by delphidreamer »

I use Delphi XE for this. I am still on delphi 7 for the project for now.

I have looked at the divergent codepages to but using the
"CODEPAGE=FFS_UNICODE" and having the databse in FFS_UNICODE
gives this problem.

I make synchronizations with web shops and some of them have mixed langauges (Russian German Spanish French and Dutch)
To realy serve then all in one database the FFS_UNICODE can work.
I tryed a test project with the default IB components and that looked very good. IT just that they dont have column sort and the project is full of them.


I use that firebrid 2.1 version

Do you think it can work with real unicode?
Or just the codepage per database
marsupilami
Platinum Boarder
Platinum Boarder
Posts: 1918
Joined: 17.01.2011, 14:17

Post by marsupilami »

Hello Delphidreamer,

I did some short tests using Delphi XE, Zeos 7 and a Firebird 2.1 Database using WIN1252 as the default character set encoding. For me it seems that there are problems in Zeos 7 that prevent all this to work correctly. In my setup, using CODEPAGE=WIN1252 will lead to a correctly displayed text when there are non-ASCII characters in the strings. But writing will not work correctly. If I use CODEPAGE=UTF8 the display does not work, but writing to the database will work correctly. But I am not a Zeos developer, so maybe I did something wrong.
But maybe it is the best Idea to stick to Delphi 7 and Zeos 6 until Zeos 7 is finished?

In that case remember that Delphi 7 is ANSI and this requires something like CODEPAGE=WIN1252 to make Zeos and Firebird work as expected. Also remember that you have to change that setting if you expect the code to work correctly on a computer that uses another codepage like a russian one. In all these cases the database still can be Unicode. Only the connection character set changes.

And last but not least some words about UNICODE_FSS: The firebird team discourages the use of that character set because of a lot of problems (see the release notes for FB 2.1). If you use Firebird 2.1 it should be perfectly possible for you to use UTF8 as the default character set if you can recreate the database easily.

I hope I did not step on anybodys toes with my assumptions and suggestions,

Jan
delphidreamer
Fresh Boarder
Fresh Boarder
Posts: 6
Joined: 26.09.2011, 12:27

Post by delphidreamer »

Mmmm.. so you say its better to use a codepage per database per country.
then Russian or other languages should be encoded.

I was hoping for a full Unicode solution but i guess that's to a bit to much for now.


I will try some more variants but the strange thing was the default IB components in Delphi got all data correct if I use the UNICOD_FSS database and codepage
put Data in
check with flame
reopen dataset
and the data looks perfect.
Other char sets had some more difficulties.

That is why I thought it would also work for ZEOS with just some code changes.
Post Reply