Page 1 of 2
Working with UTF8 encoded dbs
Posted: 12.02.2012, 15:50
by pawelp
I have a postgres 8.3 Database which is encoded in UTF8.
I'm using Zeos 6.6.6
Before connecting, I'm setting the connection properties like that
Code: Select all
ZConn.Properties.Values['codepage']:='UTF8';
ZConn.Properties.Values['client_encoding']:='UTF8'; //
now, when I execute a SQL Query, I got the following error:
Code: Select all
exception message : SQL Error: ERROR: invalid byte sequence for encoding "UTF8": 0xe46c6c HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
can it be, that Zeos do not handle the UTF8 encodint?
ok, so I try to do it myself using the following pseudo code:
Code: Select all
Query.sql.text := UTF8Encode(sql);
This works. But then I faced an other problem.
For some internal reasons, I have to use the folowing sql statements looking like:
Code: Select all
SELECT field1 AS "Field Caption" FROM xxxx
now when the Field Caption has any non ASCI Charackters, the fields can not be accessed afterwords using:
Code: Select all
Query.Fieldbyname('Field caption')
I assume if I would access them using:
Code: Select all
Query.fieldbyname(UTF8Encode('field caption'))
it would work.
But is there any way to let Zeos handle all the UTF8 conversion?
Or do I really need to change all my code to do it ?
Posted: 13.02.2012, 13:59
by jeremicm
Why using zeos 6?
Can You move to new delphi (with native unicode support) and zeos 7?
Posted: 13.02.2012, 16:22
by pawelp
Is Zeus 7 stable enough
If so I could upgrade
But for some time I'm stuck with delphi 2007.
But maybe I will be able to move some parts of the project to delphi 2010 if that is the only solution
Posted: 14.02.2012, 10:43
by marsupilami
Hello pawelp,
it seems that you use an ANSI-Version of Delphi. Maybe you should ask Postgres to do the conversion between your codepage (1250?) and UTF8? Unfortunately I don't know how to do that. Maybe something like
ZConn.Properties.Values['codepage']:='WIN1250';
or
ZConn.Properties.Values['client_encoding']:='WIN1250';
or
ZConn.Properties.Values['codepage']:='CP1250';
or
ZConn.Properties.Values['client_encoding']:='CP1250';
The problem is that you ask the database to deliver all strings as UTF8, and ZEOS 6 does not do an automatic conversion for that. It just treats them as regular strings.
Best regards,
Jan
Posted: 14.02.2012, 13:56
by pawelp
I tried to set the codepage to WIN1252.
It works fine. But... sometimes I got the error, that a unicode charackter as no equivalent in WIN1252. That is a SQL Error reported by postgres.
Maybe if there were a way to change that behavior from raising an error to be just a warrning...
Posted: 20.02.2012, 15:02
by marsupilami
Hello pawelp,
is WIN1252 really the correct codepage for you to use? I just ask because your profile states that you live in poland and Wikipedia says that the codepage, which is used in poland, should be WIN1250.
I don't use postgres, so I don't know if it is possible to do anything about these errors.
If these things don't work out correctly, then moving to Lazarus with Zeos 6.6.6 or to a newer Delphi with Zeos 7 might be your only options.
Best regards,
Jan
Posted: 20.02.2012, 16:55
by pawelp
My customers are from germany, that why I use WIN1252. I think that is the proper one for germany, is it not?
I will try to move some parts of the code to a newer version of delphi.
But how stable is Zeos 7? It is still in alpha state, right?
Posted: 21.02.2012, 11:28
by marsupilami
You are right, WIN1252 is the correct codepage for a german windows. Is the contents of your database from your german customers? Or do they have different contents where the transliteration error would not occur?
Zeos 7 is still in an alpha state. So if you use Zeos 7, you would have to test your application very carefully. And you will have to test it very carefully until Zeos 7 can be considered stable.
If you want the stability of Zeos 6 and need Unicode, then Lazarus will be your only choice because Lazarus can use UTF8 natively, which Delphi can not.
Posted: 21.02.2012, 11:58
by pawelp
Some customers have legacy databases, either imported from xls, access or mysql. The problem do not occur on all of them. But maybe on 10%.
The program is old and large, the dB structure and the dB access changed during the lifetime of the application.
I assume, the characters that do not have a WIN1252 equivalent are anyway just junk... But I'm not sure how to get rid of them.
I will stay back from Zeos 7 for now. I can not afford to make my customers any more angry
I think I need to have a new look at Lazarus, seems to be grown since I last used it.
Posted: 21.02.2012, 18:17
by marsupilami
If you want to remove them from the database - that should be possible with Ansi-Delphi. But if removing the characters from the database is not an option then it will not be possible unless you use utf8-conversion functions or some kind of additional translating columns in your dataset or something like that...
Posted: 21.02.2012, 18:28
by pawelp
The problem is, I do not know what this charackters are and in which table and columd they appear.
It is like a small corruption inside the db
Like I said, this is most probably caused by the legacy data beeing imported by different means.
The only solution would be to advise the db somehow to eliminate any non win1252 charackters table by table.
but that would take a serious amount of time, as the data amount is quite large.
I searched for something like a compatibility switch inside the db.
Somehow it must be posible to force postgres to change the error message into a warrning or a hint
Posted: 22.02.2012, 18:19
by marsupilami
pawelp wrote:The only solution would be to advise the db somehow to eliminate any non win1252 charackters table by table.
I think, this can only be done by an external application - get the data in unicode, eliminate all non-WIN1252-characters and write it back into the database.
pawelp wrote:but that would take a serious amount of time, as the data amount is quite large.
What about a program, that does this in small chunks over time? Doing that in smaller chunks should keep the database usable?
Or you decide to go to Unicode completely, so your customers can enter data in chinese and russian and other languages too...
pawelp wrote:I searched for something like a compatibility switch inside the db.
Somehow it must be posible to force postgres to change the error message into a warrning or a hint
I would assume that this is not possible because data integrity is very high valued in RDBMSs. But that is only an assumption...
Posted: 01.03.2012, 00:48
by mdaems
pawelp,
You may be a candidate to use the testing-egonhugeist branch. This is a new effort to make zeos 7 work better with different codepages. It's written by a german, so you might even check with him directly if your case would be a good one to try his code on.
Mark
Posted: 01.03.2012, 15:01
by pawelp
ok, why not.
Can you put me to a place where I can find how to download that build and how to contact the author?
Posted: 12.03.2012, 23:51
by marsupilami
Hello pawelp,
the svn location of that code is
https://zeoslib.svn.sourceforge.net/svn ... gonhugeist
the user that is working on that branch is egonhugeist
Best regards,
Jan