Page 1 of 4

[patch_done] Delphi 2010 + MySQL + UTF8

Posted: 12.01.2010, 12:29
by KDeveloper2
Hi,

try to work with UTF8 strings like "кщиуке" in MySQL.
I tested with another Connection Component and it is working with UTF8, but i don´t want to change all my code.

After changing

Code: Select all

function TZAbstractRODataset.GetFieldData(Field: TField; Buffer: Pointer): Boolean;
...
        ftWideString:
          begin
            {$IFDEF BDS4_UP}
              if self.connection.UseUnicode // new property
                then WStrCopy(Buffer, PWideChar(UTF8Decode(RowAccessor.GetUnicodeString(ColumnIndex, Result))))
                else WStrCopy(Buffer, PWideChar(RowAccessor.GetUnicodeString(ColumnIndex, Result)));
            {$ELSE}
              PWideString(Buffer)^ := RowAccessor.GetUnicodeString(ColumnIndex, Result);
            {$ENDIF}
            Result := not Result;
          end;
...
i can read this in my TDBGrid correctly.

When i try to post such UTF8 - strings, then i see only '??????',
i think there is an problem in

Code: Select all

procedure TZRowAccessor.SetUnicodeString(ColumnIndex: Integer; Value: WideString);
....
               System.Move(PWideString(Value)^,
                  Pointer(@FBuffer.Columns[FColumnOffsets[ColumnIndex - 1] + 1])^,
                  Length(Value) * 2 + 2);
...
i´ve installed
  • Delphi 2010
    MySQL 5
    Zeos 7 alpha
Any idea how to solve this problem?

Thanks for help!

Posted: 26.01.2010, 12:13
by KDeveloper2
does nobody uses UTF8 in MySQL?

Posted: 26.01.2010, 22:57
by bravecobra
You may a point there. I personally think we need a compiler directive there as well setting it as a Unicode string instead of a widestring. You may want to get the latest SVN version, as we recently had an issue with unicode parameters as well.

Posted: 29.01.2010, 10:09
by KDeveloper2
Hi,
i used the latest REV. 779 from here
http://zeosdownloads.firmos.at/downloads/snapshots/

but i get кщиуке instead кщиуке
what is wrong? :?:

:(

Posted: 29.01.2010, 16:56
by KDeveloper2
finally i can use mysql with Delphi 2010 + UTF8

i changed some units (REV. 779) and now it works fine,
perhaps somebody can it integrate into the next versions.

Posted: 29.01.2010, 23:21
by mdaems
KDeveloper2,

This is good news, but I wonder how these changes will affect people using other charsets with mysql.
The more, did you test if this changed version still compiles with older compilers?

Mark

EDIT : I checked this patch and can't just accept it. Main reasons : this also handles unicode strings and strings equally in older compilers. Code changes that are not mysql specific. Were those tested? And most important : I have no experience whatsoever in this domain (dutch doesn't need more than the ascii characterset)

Can somebody else just pop in the descussion and work together with KDeveloper2 to work this out?

Posted: 01.02.2010, 10:19
by KDeveloper2
Hi mdaems,

you are right, i think my changes will ONLY work with Delphi 2010, MySQL 5 and UTF8.
Thats why i need some help to integrate this changes.

I think we need a compiler directive and/or a property in the connection depending on the charset.

I will work it out.

Posted: 02.02.2010, 22:48
by aperger
Hi "mdaems" and "KDeveloper2",

I had Delphi 7 (with Crosskylix - K3), and Delphi 2005, and Delphi 2010 installed in my PC. So.... If you have any solution what I can test or I can do some small fixes... maybe. Please call me or send your modification.

I am planning to move my Storage and Invoice system into D2010 and keep it alive under Kylix 3. :-)

The UTF8 is not working for me too: D2010 + SVN 799 + PostgreSQL. The problem is the same!

Tanks for it.

Posted: 02.02.2010, 23:22
by aperger
Hi Again.

I do some test and I use connection properties to solve the problem, please test it:

Befor the application starts:

Code: Select all

{$IFDEF LINUX}
  langenv:=GetEnvironmentVariable('LANG');
  pos:=AnsiPos('.',langenv);
  if pos>0 then begin
    codepage:=Copy(langenv,pos+1,Length(langenv)-pos);
  end else begin
    codepage:='iso-8859-2';
  end;
{$ENDIF}
{$IFDEF MSWINDOWS}
  {$IFDEF UNICODE}
  DefCodePageID:=DefaultSystemCodePage;
  if DefCodePageID = 65001 then begin
    codepage:='utf8';
  end else if DefCodePageID = 1250 then begin
    codepage:='WIN1250';
  end else begin
    codepage:=IntToStr(DefCodePageID);
  end;
  {$ELSE}
  codepage:='WIN1250';
  {$ENDIF}
{$ENDIF}
Before the connection:

Code: Select all

procedure TDM.SetPgSQLProperties;
begin
{$IFDEF LINUX}
  self.conDB.Properties.Values['codepage']:=codepage;
  self.conDB.Properties.Values['client_encoding']:=codepage;
{$ELSE}
  {$IFDEF UNICODE}
  self.conDB.Properties.Values['codepage']:='UTF8';
  self.conDB.Properties.Values['client_encoding']:=codepage;
  {$ELSE}
  self.conDB.Properties.Values['codepage']:=codepage;
  self.conDB.Properties.Values['client_encoding']:=codepage;
  {$ENDIF}
{$ENDIF}
end;

procedure TDM.conDBBeforeConnect(Sender: TObject);
begin
  if (conDB.Protocol = 'postgresql') or (conDB.Protocol = 'postgresql-7') or (conDB.Protocol = 'postgresql-8') then begin
    SetPgSQLProperties;
  end;
end;
These line can solve my PostgreSQL UTF8 problems I will test it with MySQL too.
I use Windows XP HUN ->>> CP1250 is the default.


Best regards

Attila

Posted: 03.02.2010, 00:49
by aperger
So... for MySQL ... the solution is very similar:



Function:

Code: Select all

procedure TDM.SetMySQLCharSetAndCollation;
begin

  (* Ez a megaoldás a ZeOS Buglistáján volt !!!
  // check version if 4.1
  if conDB.Protocol='mysql-4.1' then
    qGeneral.SQL.Text := 'SET NAMES ''latin2'' '
  else // if not 4.1
    qGeneral.SQL.Text := 'SET CHARACTER SET ''latin2'' ';
  qGeneral.ExecSQL;

  qGeneral.SQL.Text := 'SET collation_connection = latin2_hungarian_ci';
  qGeneral.ExecSQL;
  *)

  (*
  !!!!Ezeketmind le kéne futtatni AZ 5.0-ás OnLine help szerint!!!!
  SET NAMES latin2;
  SET CHARACTER_SET latin2;

  SET character_set_client = latin2;
  SET character_set_results = latin2;
  SET collation_connection = latin2_hungarian_ci;
  *)


{$IFDEF LINUX}
  if AnsiContainsText(codepage,'UTF') then begin
    qGeneral.SQL.Text := 'SET NAMES ''utf8''';
    qGeneral.ExecSQL;

    qGeneral.SQL.Text := 'SET CHARACTER SET utf8';
    qGeneral.ExecSQL;

    qGeneral.SQL.Text := 'SET character_set_client = utf8';
    qGeneral.ExecSQL;

    qGeneral.SQL.Text := 'SET character_set_results = utf8';
    qGeneral.ExecSQL;

    qGeneral.SQL.Text := 'SET collation_connection = utf8_hungarian_ci';
    qGeneral.ExecSQL;
  end else begin
    qGeneral.SQL.Text := 'SET NAMES ''latin2''';
    qGeneral.ExecSQL;

    qGeneral.SQL.Text := 'SET CHARACTER SET latin2';
    qGeneral.ExecSQL;

    qGeneral.SQL.Text := 'SET character_set_client = latin2';
    qGeneral.ExecSQL;

    qGeneral.SQL.Text := 'SET character_set_results = latin2';
    qGeneral.ExecSQL;

    qGeneral.SQL.Text := 'SET collation_connection = latin2_hungarian_ci';
    qGeneral.ExecSQL;
  end;
{$ELSE}
  {$IFDEF UNICODE}
  qGeneral.SQL.Text := 'SET NAMES ''utf8''';
  qGeneral.ExecSQL;

  qGeneral.SQL.Text := 'SET CHARACTER SET utf8';
  qGeneral.ExecSQL;

  qGeneral.SQL.Text := 'SET character_set_client = cp1250';
  qGeneral.ExecSQL;

  qGeneral.SQL.Text := 'SET character_set_results = cp1250';
  qGeneral.ExecSQL;

  qGeneral.SQL.Text := 'SET collation_connection = utf8_hungarian_ci';
  qGeneral.ExecSQL;
  {$ELSE}
  qGeneral.SQL.Text := 'SET NAMES ''latin2''';
  qGeneral.ExecSQL;

  qGeneral.SQL.Text := 'SET CHARACTER SET latin2';
  qGeneral.ExecSQL;

  qGeneral.SQL.Text := 'SET character_set_client = latin2';
  qGeneral.ExecSQL;

  qGeneral.SQL.Text := 'SET character_set_results = latin2';
  qGeneral.ExecSQL;

  qGeneral.SQL.Text := 'SET collation_connection = latin2_hungarian_ci';
  qGeneral.ExecSQL;
  {$ENDIF}
{$ENDIF}
end;

Call this function after connection:

Code: Select all

procedure TDM.conDBAfterConnect(Sender: TObject);
begin

  ...

  if AnsiContainsText(conDB.Protocol,'mysql') then begin
    conDB.TransactIsolationLevel:=tiReadCommitted;
    SetMySQLCharSetAndCollation;
  end;

end;

Posted: 03.02.2010, 21:47
by mdaems
aperger,
Is this something Zeoslib could do instead of the user in a safe way?

Mark

Posted: 04.02.2010, 11:49
by aperger
I don't understand your question exactly, but themeaning, it looks me do not need to modify any ZeOS source...
Before or After the connection the programmer could set the locales setting.

Posted: 04.02.2010, 21:29
by mdaems
Well,
Maybe we could add setMySQLCharSetAndCollation to the TZMySQLConnection.Open logic, taking the wanted charset/collation from the Connection.properties. Defaulting to utf8 when the property isn't set.

Actually, this code is already available:

Code: Select all

    { Sets a client codepage. }
    if FClientCodePage <> '' then
    begin
      {$IFDEF DELPHI12_UP}
      SQL := PAnsiChar(UTF8String(Format('SET CHARACTER SET %s', [FClientCodePage])));
      {$ELSE}
      SQL := PAnsiChar(Format('SET CHARACTER SET %s', [FClientCodePage]));
      {$ENDIF}
      FPlainDriver.ExecQuery(FHandle, SQL);
      CheckMySQLError(FPlainDriver, FHandle, lcExecute, SQL);
      DriverManager.LogMessage(lcExecute, FPlainDriver.GetProtocol, SQL);
    end;
Mark

Posted: 04.02.2010, 22:50
by aperger
You idea sounds good... :-) Thanks for it.


According to the MySQL OnLine help (version 5.0), we should set all of the following parameters:

Code: Select all

 SET NAMES 'utf8';
 SET CHARACTER SET utf8; 
 SET character_set_client = utf8; 
 SET character_set_results = utf8; 
 SET collation_connection = utf8_hungarian_ci; 
BUT I do not know the correct default values.

... AND why I need to set different values under Delphi 2010
- for NAMES and Client charset at MySQL (DEFAULT CHARACTER utf8):
'SET NAMES ''utf8''', but 'SET character_set_client = cp1250';
- for codepage and client encoding at PostgeSQL (DEFAULT CHARACTER utf8):
self.conDB.Properties.Values['codepage']:='UTF8';
but self.conDB.Properties.Values['client_encoding']:='WIN1250';


Best regards,

Attila

Posted: 06.02.2010, 14:24
by mdaems
character_set_client : the charset statements are written in by the client (= zquery.sql and parameters)
collation_connection : the collation the server uses when sorting our results
character_set_results : the charset of the data the client receives from the server. These are the resultset values. Seems logical this has to be the same charset as character_set_client

From the mysql 5.1 online manual if get this information:
SET NAMES 'charset_name' [COLLATE 'collation_name']

SET NAMES indicates what character set the client will use to send SQL statements to the server. Thus, SET NAMES 'cp1251' tells the server, “future incoming messages from this client are in character set cp1251.” It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a SELECT statement.)

A SET NAMES 'x' statement is equivalent to these three statements:

SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;
Setting each of these character set variables also sets its corresponding collation variable to the default correlation for the character set. For example, setting character_set_connection to x also sets collation_connection to the default collation for x. It is not necessary to set that collation explicitly. To specify a particular collation for the character sets, use the optional COLLATE clause:

SET NAMES 'charset_name' COLLATE 'collation_name'
This means only set names (eventually including a collaton ) should be sufficient!
In the notes from the manual there are warnings about using set names and set charset together (as the last one sets the collation against to the database default)

So, can you please try behaviour when you change the SET CHARACTER SET in zdbcmysql.pas to set names and use the connection property? My impression is this would be the only logical solution. Then we could also add a collation property.

Mark