[patch_done] Strings like E'\\' are incorrectly tokenized

The alpha/beta tester's forum for ZeosLib 7.0.x series

Report problems concerning our Delphi 2009+ version and new Zeoslib 7.0 features here.

This is a forum that will be removed once the 7.X version goes into stable!!

Moderators: gto, EgonHugeist, olehs

olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

[patch_done] Strings like E'\\' are incorrectly tokenized

Post by olehs »

I guess error is in TZMySQLQuoteState.NextToken
User avatar
EgonHugeist
Zeos Project Manager
Zeos Project Manager
Posts: 1936
Joined: 31.03.2011, 22:38

Post by EgonHugeist »

olehs,

now you make me curious again.
In the description you wrote PostgreSQL. standard_conforming_strings=off which points me to the Postgre Tokenizer.

But here you're writing about the TZMySQLQuoteState.

In my mind your E'\\' example should be ttWord, ttQuoted.

Can you explain me better what exactly you mean?

Michael
Best regards, Michael

You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/

Image
olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

Post by olehs »

EgonHugeist,
didn't want to confuse you.
Postgre tokenizer uses inherited (MySql) tokenizer.

Look at this part of TZMySQLQuoteState.NextToken

Code: Select all

        end;
      end;
    end else
      if (ReadChar = BackSlash) and (TempLastChar  = BackSlash) then
        Result.TokenType := ttEscapedQuoted
      end
      else
        if ReadChar = BackSlash then
          TempLastChar := ReadChar
        else
          TempLastChar := #0;
When (ReadChar = BackSlash) and (TempLastChar = BackSlash) condition is met,
one need to reset TempLastChar:

Code: Select all

        end;
      end;
    end else
      if (ReadChar = BackSlash) and (TempLastChar  = BackSlash) then
      begin
        Result.TokenType := ttEscapedQuoted;
        TempLastChar := #0;
      end
      else
        if ReadChar = BackSlash then
          TempLastChar := ReadChar
        else
          TempLastChar := #0;
User avatar
EgonHugeist
Zeos Project Manager
Zeos Project Manager
Posts: 1936
Joined: 31.03.2011, 22:38

Post by EgonHugeist »

olehs,

hum this is correct now again! I didn't expect or saw that th Postgre Tokenizer is descendant of the MySQL Tokenizer!

It was me who chaged this sequence. The reason was a simple one. I'm trying to explain:

MySQL and several other Drivers do need spezial escaped chars like ' or /.

As i introduced my PreprepareSQL functionallity i wanted not only have a proper encoding no i wanted also to escape the strings like the engiense stupidly do expect them. Which means for MySQL: you need only to use AnsiQuotedStr/QuotedStr. So i chaged the sequence in a kind to detect still escaped or unescaped chars.

This behavior i want to make possible for all Databases where it makes sence (MsSQL excluded -> no common charset).

Our testsuites do not report me any issues. So i need better tests.

Can you give me some additional examples for postgre which i can add as tests and what do we have to expect as token results? That would be nice.

Michael
Best regards, Michael

You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/

Image
olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

Post by olehs »

Well, it's not that easy as I thought.
Many things in tokenizing PostreSQL strings depend on standard_conforming_strings parameter.
So I'll have to give samples for both cases.

standard_conforming_strings=on
In this case backslashes are treated just as they are:
'\\' is treated as \\
'\n' = \n
'\' =\

But one can explicitly use 'Escape syntax' by adding E, then
E'\\' is treated as \
E'\n' is a newline etc.

The full list is in the docs:

Code: Select all

\b backspace 
\f form feed 
\n newline 
\r carriage return 
\t tab 
\o, \oo, \ooo (o = 0 - 7)  octal byte value 
\xh, \xhh (h = 0 - 9, A - F)  hexadecimal byte value 
But when standard_conforming_strings=off you don't even have to use E-syntax - all backslashes are treated as escape characters.
'\\' is treated as \
'\n' is a newline,
E'\n' is also a newline,
'\z' is treated as z, because it's not valid escape sequence and escape char is skipped
Last edited by olehs on 23.07.2012, 23:34, edited 1 time in total.
olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

Post by olehs »

One more thing:
valid quote escape syntax is ''(two quotes), but depending on
backslash_quote parameter, it is possible to use \' to escape quote
User avatar
EgonHugeist
Zeos Project Manager
Zeos Project Manager
Posts: 1936
Joined: 31.03.2011, 22:38

Post by EgonHugeist »

olehs,

Hmpff that doesn't make me happy. Zeos was never able to detect such stupid things. Terrible! Don't know why they make such stange things and actually i thought MySQL is the goal runner there: Try to insert 5 chars into a varchar(5) field,lol, it only accepts 4! Best and cleared API from developer side: Oracle and Firebird. Also does SQLite follow simply the AnsiQuotedStr rules which is safe.

However, with this stuff i can work and build some test cases. Not the ttQuoted detection need an upgrade NO the ttWord state has to check this and has to fall into ttEscapeQuoted in these cases.

Another thing: You do work with a fullunicode IDE. Now tell me what i've to do with still escaped tokens? Do i have to UTF8encode/cast to ansi(in case of the client encoding) or shold i start from the premise you aleady did this before?

Michael
Best regards, Michael

You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/

Image
olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

Post by olehs »

EgonHugeist,

I didn't actually get your question about fullunicode IDEs. If you are talking about Preprepare property - it's probably not needed in there, because
Strings are always unicode. So you always have to cast/encode it to ClientEncoding without escaping. Such strings can even be mixed with escapes like '\134'. In this case behavior depends on standard_conforming_strings parameter but it's server's duty to deal with them.

Here is my vision for this sample code

Code: Select all

ZQuery1.SQL.Text := 'select description || '' - тест\134'' || E''\\t'' || ''\n'' from reports where title = :title;';
ZQuery1.ParamByName('title').AsString := 'Common\Отчет'#9'1';
if standard_conforming_strings=on the final query should be in client encoding and should look like

Code: Select all

select description || ' - тест\134' || E'\\t' || '\n' from reports where title = 'Common\Отчет	1';
if standard_conforming_strings=off the final query should also be in client encoding and should look like

Code: Select all

select description || ' - тест\134' || E'\\t' || '\n' from reports where title = E'Common\\Отчет	1';
So there is no need to guess if the string is already escaped, you just always have to escape quotes in String params and backslashes when SCS=off.

Maybe it would be better to have a query parameter, somehing like
escape_strings = {off, on, auto, always}
off - strings w/o E, backslashes - as they are
on - strings with E, backslashes - as they are
always - strings with E, backslashes are doubled
auto - when SCS=on - same as off, SCS=off - same as always

Default value could by auto
User avatar
EgonHugeist
Zeos Project Manager
Zeos Project Manager
Posts: 1936
Joined: 31.03.2011, 22:38

Post by EgonHugeist »

olehs,

slightly this is getting tooo complicated i think.

I changed only the ttQuoted Token detection for Postgre similar to your first proposal.
Also i wont have the time to implement the Token issues Postgre has. I'll add a test with your "vision" code later on.

Patch done Rev. 1576

Can you check the issue again?

Michael
Best regards, Michael

You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/

Image
olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

Post by olehs »

Ok, I'll check it.
By the way, found a little bug.
You do not have the required permissions to view the files attached to this post.
Last edited by olehs on 24.07.2012, 12:56, edited 1 time in total.
olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

Post by olehs »

One more thing. Seems like TZPostgreSQLConnection.GetEscapeString does exactly opposite to what I described about standard_conforming_strings.

When ON - it encodes it, but it shouldn't.
Oleg
olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

Post by olehs »

EgonHugeist,

checked rev.1576. Tried it with testcase
select '\\' || :str

and got

Code: Select all

2012-07-24 14:55:01 cat: Execute, proto: postgresql-8, msg: select '\\' || ?, errcode: 0, error: ERROR:  syntax error at end of input
LINE 1: select '\\' || ?
                        ^
olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

Post by olehs »

EgonHugeist,

How about reverting to old MySQL implementaion just for PG?

Code: Select all

{**
  Return a quoted string token from a reader. This method
  will collect characters until it sees a match to the
  character that the tokenizer used to switch to this state.

  @return a quoted string token from a reader
}
function TZPostgreSQLQuoteState.NextToken(Stream: TStream;
  FirstChar: Char; Tokenizer: TZTokenizer): TZToken;
const BackSlash = Char('\');
var
  ReadChar: Char;
  LastChar: Char;
  QuoteChar: Char;
  QuoteCount: Integer;
begin
  Result.Value := FirstChar;
  QuoteCount := 1;
  If FirstChar = '`' then
    Result.TokenType := ttQuotedIdentifier
  Else
    Result.TokenType := ttQuoted;

  QuoteChar := FirstChar;

  LastChar := #0;

  while Stream.Read(ReadChar, SizeOf(Char)) > 0 do
  begin
    if ReadChar = QuoteChar then Inc(QuoteCount);
    if (LastChar = FirstChar) and (ReadChar <> FirstChar) then
    begin
      if QuoteCount mod 2 = 0 then
      begin
        Stream.Seek(-SizeOf(Char), soFromCurrent);
        Break;
      end;
    end;
    Result.Value := Result.Value + ReadChar;
    if LastChar = BackSlash then
      LastChar := #0
    else if (LastChar = FirstChar) and (ReadChar = FirstChar) then
      LastChar := #0
    else LastChar := ReadChar;
  end;
end;
User avatar
EgonHugeist
Zeos Project Manager
Zeos Project Manager
Posts: 1936
Joined: 31.03.2011, 22:38

Post by EgonHugeist »

olehs,

id did apply your bug patch. Rev1578
How about reverting to old MySQL implementaion just for PG?
[s]No problem does it fix the issue?
[/s]

Both proposals tested. No issues to see. Patch applyed Rev, 1579
Michael
Best regards, Michael

You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/

Image
olehs
Zeos Dev Team
Zeos Dev Team
Posts: 118
Joined: 09.11.2009, 21:05

Post by olehs »

EgonHugeist,

Still testing, but seems to work fine.

And what about TZPostgreSQLConnection.GetEscapeString? A query like

Code: Select all

q.SQL.Text := 'select * from sys_values where sysval_name = :sn';
q.ParamByName('sn').AsString := 'QMenu\QUICKMENUROWCOUNT';
And I get final query like

Code: Select all

select * from sys_values where sysval_name = 'QMenu\QUICKMENUROWCOUNT'
when SCS=off
and

Code: Select all

select * from sys_values where sysval_name = 'QMenu\134QUICKMENUROWCOUNT'
when SCS=on
which is wrong in both cases
Locked