Page 1 of 4

[patch_done] Strings like E'\\' are incorrectly tokenized

Posted: 23.07.2012, 21:26
by olehs
I guess error is in TZMySQLQuoteState.NextToken

Posted: 23.07.2012, 22:04
by EgonHugeist
olehs,

now you make me curious again.
In the description you wrote PostgreSQL. standard_conforming_strings=off which points me to the Postgre Tokenizer.

But here you're writing about the TZMySQLQuoteState.

In my mind your E'\\' example should be ttWord, ttQuoted.

Can you explain me better what exactly you mean?

Michael

Posted: 23.07.2012, 22:15
by olehs
EgonHugeist,
didn't want to confuse you.
Postgre tokenizer uses inherited (MySql) tokenizer.

Look at this part of TZMySQLQuoteState.NextToken

Code: Select all

        end;
      end;
    end else
      if (ReadChar = BackSlash) and (TempLastChar  = BackSlash) then
        Result.TokenType := ttEscapedQuoted
      end
      else
        if ReadChar = BackSlash then
          TempLastChar := ReadChar
        else
          TempLastChar := #0;
When (ReadChar = BackSlash) and (TempLastChar = BackSlash) condition is met,
one need to reset TempLastChar:

Code: Select all

        end;
      end;
    end else
      if (ReadChar = BackSlash) and (TempLastChar  = BackSlash) then
      begin
        Result.TokenType := ttEscapedQuoted;
        TempLastChar := #0;
      end
      else
        if ReadChar = BackSlash then
          TempLastChar := ReadChar
        else
          TempLastChar := #0;

Posted: 23.07.2012, 22:44
by EgonHugeist
olehs,

hum this is correct now again! I didn't expect or saw that th Postgre Tokenizer is descendant of the MySQL Tokenizer!

It was me who chaged this sequence. The reason was a simple one. I'm trying to explain:

MySQL and several other Drivers do need spezial escaped chars like ' or /.

As i introduced my PreprepareSQL functionallity i wanted not only have a proper encoding no i wanted also to escape the strings like the engiense stupidly do expect them. Which means for MySQL: you need only to use AnsiQuotedStr/QuotedStr. So i chaged the sequence in a kind to detect still escaped or unescaped chars.

This behavior i want to make possible for all Databases where it makes sence (MsSQL excluded -> no common charset).

Our testsuites do not report me any issues. So i need better tests.

Can you give me some additional examples for postgre which i can add as tests and what do we have to expect as token results? That would be nice.

Michael

Posted: 23.07.2012, 23:24
by olehs
Well, it's not that easy as I thought.
Many things in tokenizing PostreSQL strings depend on standard_conforming_strings parameter.
So I'll have to give samples for both cases.

standard_conforming_strings=on
In this case backslashes are treated just as they are:
'\\' is treated as \\
'\n' = \n
'\' =\

But one can explicitly use 'Escape syntax' by adding E, then
E'\\' is treated as \
E'\n' is a newline etc.

The full list is in the docs:

Code: Select all

\b backspace 
\f form feed 
\n newline 
\r carriage return 
\t tab 
\o, \oo, \ooo (o = 0 - 7)  octal byte value 
\xh, \xhh (h = 0 - 9, A - F)  hexadecimal byte value 
But when standard_conforming_strings=off you don't even have to use E-syntax - all backslashes are treated as escape characters.
'\\' is treated as \
'\n' is a newline,
E'\n' is also a newline,
'\z' is treated as z, because it's not valid escape sequence and escape char is skipped

Posted: 23.07.2012, 23:32
by olehs
One more thing:
valid quote escape syntax is ''(two quotes), but depending on
backslash_quote parameter, it is possible to use \' to escape quote

Posted: 23.07.2012, 23:51
by EgonHugeist
olehs,

Hmpff that doesn't make me happy. Zeos was never able to detect such stupid things. Terrible! Don't know why they make such stange things and actually i thought MySQL is the goal runner there: Try to insert 5 chars into a varchar(5) field,lol, it only accepts 4! Best and cleared API from developer side: Oracle and Firebird. Also does SQLite follow simply the AnsiQuotedStr rules which is safe.

However, with this stuff i can work and build some test cases. Not the ttQuoted detection need an upgrade NO the ttWord state has to check this and has to fall into ttEscapeQuoted in these cases.

Another thing: You do work with a fullunicode IDE. Now tell me what i've to do with still escaped tokens? Do i have to UTF8encode/cast to ansi(in case of the client encoding) or shold i start from the premise you aleady did this before?

Michael

Posted: 24.07.2012, 08:56
by olehs
EgonHugeist,

I didn't actually get your question about fullunicode IDEs. If you are talking about Preprepare property - it's probably not needed in there, because
Strings are always unicode. So you always have to cast/encode it to ClientEncoding without escaping. Such strings can even be mixed with escapes like '\134'. In this case behavior depends on standard_conforming_strings parameter but it's server's duty to deal with them.

Here is my vision for this sample code

Code: Select all

ZQuery1.SQL.Text := 'select description || '' - тест\134'' || E''\\t'' || ''\n'' from reports where title = :title;';
ZQuery1.ParamByName('title').AsString := 'Common\Отчет'#9'1';
if standard_conforming_strings=on the final query should be in client encoding and should look like

Code: Select all

select description || ' - тест\134' || E'\\t' || '\n' from reports where title = 'Common\Отчет	1';
if standard_conforming_strings=off the final query should also be in client encoding and should look like

Code: Select all

select description || ' - тест\134' || E'\\t' || '\n' from reports where title = E'Common\\Отчет	1';
So there is no need to guess if the string is already escaped, you just always have to escape quotes in String params and backslashes when SCS=off.

Maybe it would be better to have a query parameter, somehing like
escape_strings = {off, on, auto, always}
off - strings w/o E, backslashes - as they are
on - strings with E, backslashes - as they are
always - strings with E, backslashes are doubled
auto - when SCS=on - same as off, SCS=off - same as always

Default value could by auto

Posted: 24.07.2012, 11:53
by EgonHugeist
olehs,

slightly this is getting tooo complicated i think.

I changed only the ttQuoted Token detection for Postgre similar to your first proposal.
Also i wont have the time to implement the Token issues Postgre has. I'll add a test with your "vision" code later on.

Patch done Rev. 1576

Can you check the issue again?

Michael

Posted: 24.07.2012, 12:40
by olehs
Ok, I'll check it.
By the way, found a little bug.

Posted: 24.07.2012, 12:44
by olehs
One more thing. Seems like TZPostgreSQLConnection.GetEscapeString does exactly opposite to what I described about standard_conforming_strings.

When ON - it encodes it, but it shouldn't.

Posted: 24.07.2012, 13:00
by olehs
EgonHugeist,

checked rev.1576. Tried it with testcase
select '\\' || :str

and got

Code: Select all

2012-07-24 14:55:01 cat: Execute, proto: postgresql-8, msg: select '\\' || ?, errcode: 0, error: ERROR:  syntax error at end of input
LINE 1: select '\\' || ?
                        ^

Posted: 24.07.2012, 13:15
by olehs
EgonHugeist,

How about reverting to old MySQL implementaion just for PG?

Code: Select all

{**
  Return a quoted string token from a reader. This method
  will collect characters until it sees a match to the
  character that the tokenizer used to switch to this state.

  @return a quoted string token from a reader
}
function TZPostgreSQLQuoteState.NextToken(Stream: TStream;
  FirstChar: Char; Tokenizer: TZTokenizer): TZToken;
const BackSlash = Char('\');
var
  ReadChar: Char;
  LastChar: Char;
  QuoteChar: Char;
  QuoteCount: Integer;
begin
  Result.Value := FirstChar;
  QuoteCount := 1;
  If FirstChar = '`' then
    Result.TokenType := ttQuotedIdentifier
  Else
    Result.TokenType := ttQuoted;

  QuoteChar := FirstChar;

  LastChar := #0;

  while Stream.Read(ReadChar, SizeOf(Char)) > 0 do
  begin
    if ReadChar = QuoteChar then Inc(QuoteCount);
    if (LastChar = FirstChar) and (ReadChar <> FirstChar) then
    begin
      if QuoteCount mod 2 = 0 then
      begin
        Stream.Seek(-SizeOf(Char), soFromCurrent);
        Break;
      end;
    end;
    Result.Value := Result.Value + ReadChar;
    if LastChar = BackSlash then
      LastChar := #0
    else if (LastChar = FirstChar) and (ReadChar = FirstChar) then
      LastChar := #0
    else LastChar := ReadChar;
  end;
end;

Posted: 24.07.2012, 15:05
by EgonHugeist
olehs,

id did apply your bug patch. Rev1578
How about reverting to old MySQL implementaion just for PG?
[s]No problem does it fix the issue?
[/s]

Both proposals tested. No issues to see. Patch applyed Rev, 1579
Michael

Posted: 24.07.2012, 15:24
by olehs
EgonHugeist,

Still testing, but seems to work fine.

And what about TZPostgreSQLConnection.GetEscapeString? A query like

Code: Select all

q.SQL.Text := 'select * from sys_values where sysval_name = :sn';
q.ParamByName('sn').AsString := 'QMenu\QUICKMENUROWCOUNT';
And I get final query like

Code: Select all

select * from sys_values where sysval_name = 'QMenu\QUICKMENUROWCOUNT'
when SCS=off
and

Code: Select all

select * from sys_values where sysval_name = 'QMenu\134QUICKMENUROWCOUNT'
when SCS=on
which is wrong in both cases