Page 1 of 1

SQLite Unicode Support

Posted: 23.03.2007, 17:28
by xtra
Hi, my first Zeos Hack.
This hack allows the use of UTF8 reading from queries and tables in SQLite.
There is a drawback nontheless as one will be forced to use "WIDECHAR(XXX)" as datatype in the SQLite DB.
This is possible, as SQLite always considers a fieldtype to be text whenever the Keyelement "TEXT" or "CHAR" is contained in the fielddef (see SQLite doc).
Now I use a custom format "WIDECHAR", which gets interpreted by ZEOS as UnicodeString and properly decoded from UTF8.
I tested this only with SQLite3 but it works nice.
Note that this is only a hack and not very elegant, I only post it here as it might be some starting point for someone.
It would be better to check the content of the field but fieldtype is made up before content is available.
Another solution would be to restrict all sqlite3 strings to be UTF8 as they are always stored as UTF8 in the SQLite.

Anyway, this approach below works for me just fine and now for the patch:

in ZDbcSqLiteResultSet.pas

Code: Select all

  TZSQLiteResultSet = class(TZAbstractResultSet)
[...]
    // BEGIN UNICODE HACK c.schiffler
    function GetUnicodeString(ColumnIndex: Integer): WideString; override;
    // END UNICODE HACK c.schiffler
[...]

// BEGIN UNICODE HACK c.schiffler
{**
  Gets the value of the designated column in the current row
  of this <code>ResultSet</code> object as
  a <code>WideString</code> in the Java programming language.

  @param columnIndex the first column is 1, the second is 2, ...
  @return the column value; if the value is SQL <code>NULL</code>, the
    value returned is <code>null</code>
}
function TZSQLiteResultSet.GetUnicodeString(ColumnIndex: Integer): WideString;
var
  Buffer                 : PChar;
begin
  Buffer := GetPChar(ColumnIndex);
  if Buffer <> nil then
    Result := UTF8Decode(Buffer)
  else
    Result := '';
end;
// END UNICODE HACK c.schiffler
in ZDbcSqLiteUtils.pas:

Code: Select all

function ConvertSQLiteTypeToSQLType(TypeName: string; var Precision: Integer;
  var Decimals: Integer): TZSQLType;
var
  P1, P2: Integer;
  Temp: string;
begin
  TypeName := UpperCase(TypeName);
  Result := stString;
  Precision := 0;
  Decimals := 0;

  P1 := Pos('(', TypeName);
  P2 := Pos(')', TypeName);
  if (P1 > 0) and (P2 > 0) then
  begin
    Temp := Copy(TypeName, P1 + 1, P2 - P1 - 1);
    TypeName := Copy(TypeName, 1, P1 - 1);
    P1 := Pos(',', Temp);
    if P1 > 0 then
    begin
      Precision := StrToIntDef(Copy(Temp, 1, P1 - 1), 0);
      Decimals := StrToIntDef(Copy(Temp, P1 + 1, Length(Temp) - P1), 0);
    end else
      Precision := StrToIntDef(Temp, 0);
  end;

  if StartsWith(TypeName, 'BOOL') then
    Result := stBoolean
[...]
// BEGIN UNICODE HACK c.schiffler
  else if TypeName = 'WIDECHAR' then
    Result := stUnicodeString
// END UNICODE HACK c.schiffler
[...]
end;

Posted: 16.04.2007, 15:53
by aducom
I'll look into it, but at first sight this is a UTF8 solution assigned to a widechar. The standard for widechar is UTF16 defined by nchar and nvarchar...

Posted: 11.05.2007, 13:51
by xtra
Well, to be honest, this is neither nor. :)
SQlite simply stores Data native as either UTF8 or UTF16 (defaulting to UTF8).

What I did was "inventing" new fieldtypes (WIDETEXT and WIDECHAR) for knowing what the real data in the field is.

So this is really an ugly hack breaking with SQL standards (as those fieldtypes are not defined in the standard) but it works for me for now.

I'd prefer to do some collation implementing in Zeos but I really have no idea where to start from.
IMHO one needs to fetch the collation info from the database and then, in the getFielddata must occur dynamic convertion between UTF16/UTF8/Ansi/... to Ansistring etc.
Problem hereby is, the fieldtype will then be stString again which will break the string then (not displaying in std controls, no matter if they could handle unicode or not).
The other way would be defaulting to fieldtype stUnicodeString which should (!) work with most DB aware components but is not guaranteed to.

IMHO the first step would be to implement a charset parameter that will convert the string data in both directions (and if charset is different from ansi then will use the fieldtype stUnicodestring).

The problem with current implementation for this attempt in current Zeos is that the
ConvertSQLiteTypeToSQLType() routine only has typename, precision and decimals, therefore one can not use a parameter in the db connection component.

Ideas of how to circumvent this are very appreciated I am willing to put time into implementing unicode into Zeos (not only sqlite) and to make the above hack obsolete but I really can't see how to attempt this with the current Zeos implementation (or maybe I simply can't find the needle in the haystack of code).
So if anyone of you has some idea of where to start, please go on and tell me.


PS: in the hack above the following was missing:
in ZDbcSqliteStatement.pas, around Line 348 add:

Code: Select all

      // unicode hack c.schiffler
      stUnicodeString:
        begin
          Result := EncodeString(Utf8Encode(SoftVarManager.GetAsUnicodeString(Value)));
        end;
      // end unicode hack c.schiffler

Posted: 29.05.2007, 14:43
by aducom
This does not seems to be a ZEOS general solution to me.

Posted: 22.03.2008, 21:02
by inetMark
Thanks for patch.
I have problem with readind data with Zeos from my existing database in SQLite. I will try your patch. I need reinstall component after that?

And how about sorting? In our language we have the names of the people which started witch special characters such as Šrobovák, Žaneta, ....
If i use this: 'SELECT * FROM people ORDER by people.name'
what may I excepted?

I need ignore accented characters and I need sort it correctly.
Example:
Simone
Šrobovák
Szuber
Žaneta
Zuza

And I don't need this:
Simone
Szuber
Zuza
Šrobovák
Žanete