Page 1 of 4
[patch_done] Strings like E'\\' are incorrectly tokenized
Posted: 23.07.2012, 21:26
by olehs
I guess error is in TZMySQLQuoteState.NextToken
Posted: 23.07.2012, 22:04
by EgonHugeist
olehs,
now you make me curious again.
In the description you wrote PostgreSQL. standard_conforming_strings=off which points me to the Postgre Tokenizer.
But here you're writing about the TZMySQLQuoteState.
In my mind your E'\\' example should be ttWord, ttQuoted.
Can you explain me better what exactly you mean?
Michael
Posted: 23.07.2012, 22:15
by olehs
EgonHugeist,
didn't want to confuse you.
Postgre tokenizer uses inherited (MySql) tokenizer.
Look at this part of TZMySQLQuoteState.NextToken
Code: Select all
end;
end;
end else
if (ReadChar = BackSlash) and (TempLastChar = BackSlash) then
Result.TokenType := ttEscapedQuoted
end
else
if ReadChar = BackSlash then
TempLastChar := ReadChar
else
TempLastChar := #0;
When
(ReadChar = BackSlash) and (TempLastChar = BackSlash) condition is met,
one need to reset TempLastChar:
Code: Select all
end;
end;
end else
if (ReadChar = BackSlash) and (TempLastChar = BackSlash) then
begin
Result.TokenType := ttEscapedQuoted;
TempLastChar := #0;
end
else
if ReadChar = BackSlash then
TempLastChar := ReadChar
else
TempLastChar := #0;
Posted: 23.07.2012, 22:44
by EgonHugeist
olehs,
hum this is correct now again! I didn't expect or saw that th Postgre Tokenizer is descendant of the MySQL Tokenizer!
It was me who chaged this sequence. The reason was a simple one. I'm trying to explain:
MySQL and several other Drivers do need spezial escaped chars like ' or /.
As i introduced my PreprepareSQL functionallity i wanted not only have a proper encoding no i wanted also to escape the strings like the engiense stupidly do expect them. Which means for MySQL: you need only to use AnsiQuotedStr/QuotedStr. So i chaged the sequence in a kind to detect still escaped or unescaped chars.
This behavior i want to make possible for all Databases where it makes sence (MsSQL excluded -> no common charset).
Our testsuites do not report me any issues. So i need better tests.
Can you give me some additional examples for postgre which i can add as tests and what do we have to expect as token results? That would be nice.
Michael
Posted: 23.07.2012, 23:24
by olehs
Well, it's not that easy as I thought.
Many things in tokenizing PostreSQL strings depend on
standard_conforming_strings parameter.
So I'll have to give samples for both cases.
standard_conforming_strings=on
In this case backslashes are treated just as they are:
'\\' is treated as \\
'\n' = \n
'\' =\
But one can explicitly use 'Escape syntax' by adding E, then
E'\\' is treated as \
E'\n' is a newline etc.
The full list is in the docs:
Code: Select all
\b backspace
\f form feed
\n newline
\r carriage return
\t tab
\o, \oo, \ooo (o = 0 - 7) octal byte value
\xh, \xhh (h = 0 - 9, A - F) hexadecimal byte value
But when
standard_conforming_strings=off you don't even have to use E-syntax - all backslashes are treated as escape characters.
'\\' is treated as \
'\n' is a newline,
E'\n' is also a newline,
'\z' is treated as z, because it's not valid escape sequence and escape char is skipped
Posted: 23.07.2012, 23:32
by olehs
One more thing:
valid quote escape syntax is ''(two quotes), but depending on
backslash_quote parameter, it is possible to use \' to escape quote
Posted: 23.07.2012, 23:51
by EgonHugeist
olehs,
Hmpff that doesn't make me happy. Zeos was never able to detect such stupid things. Terrible! Don't know why they make such stange things and actually i thought MySQL is the goal runner there: Try to insert 5 chars into a varchar(5) field,lol, it only accepts 4! Best and cleared API from developer side: Oracle and Firebird. Also does SQLite follow simply the AnsiQuotedStr rules which is safe.
However, with this stuff i can work and build some test cases. Not the ttQuoted detection need an upgrade NO the ttWord state has to check this and has to fall into ttEscapeQuoted in these cases.
Another thing: You do work with a fullunicode IDE. Now tell me what i've to do with still escaped tokens? Do i have to UTF8encode/cast to ansi(in case of the client encoding) or shold i start from the premise you aleady did this before?
Michael
Posted: 24.07.2012, 08:56
by olehs
EgonHugeist,
I didn't actually get your question about fullunicode IDEs. If you are talking about Preprepare property - it's probably not needed in there, because
Strings are always unicode. So you always have to cast/encode it to ClientEncoding without escaping. Such strings can even be mixed with escapes like '\134'. In this case behavior depends on
standard_conforming_strings parameter but it's server's duty to deal with them.
Here is my vision for this sample code
Code: Select all
ZQuery1.SQL.Text := 'select description || '' - тест\134'' || E''\\t'' || ''\n'' from reports where title = :title;';
ZQuery1.ParamByName('title').AsString := 'Common\Отчет'#9'1';
if
standard_conforming_strings=on the final query should be in client encoding and should look like
Code: Select all
select description || ' - тест\134' || E'\\t' || '\n' from reports where title = 'Common\Отчет 1';
if
standard_conforming_strings=off the final query should also be in client encoding and should look like
Code: Select all
select description || ' - тест\134' || E'\\t' || '\n' from reports where title = E'Common\\Отчет 1';
So there is no need to guess if the string is already escaped, you just always have to escape quotes in String params and backslashes when SCS=off.
Maybe it would be better to have a query parameter, somehing like
escape_strings = {off, on, auto, always}
off - strings w/o E, backslashes - as they are
on - strings with E, backslashes - as they are
always - strings with E, backslashes are doubled
auto - when SCS=on - same as
off, SCS=off - same as
always
Default value could by
auto
Posted: 24.07.2012, 11:53
by EgonHugeist
olehs,
slightly this is getting tooo complicated i think.
I changed only the ttQuoted Token detection for Postgre similar to your first proposal.
Also i wont have the time to implement the Token issues Postgre has. I'll add a test with your "vision" code later on.
Patch done Rev. 1576
Can you check the issue again?
Michael
Posted: 24.07.2012, 12:40
by olehs
Ok, I'll check it.
By the way, found a little bug.
Posted: 24.07.2012, 12:44
by olehs
One more thing. Seems like TZPostgreSQLConnection.GetEscapeString does exactly opposite to what I described about standard_conforming_strings.
When ON - it encodes it, but it shouldn't.
Posted: 24.07.2012, 13:00
by olehs
EgonHugeist,
checked rev.1576. Tried it with testcase
select '\\' || :str
and got
Code: Select all
2012-07-24 14:55:01 cat: Execute, proto: postgresql-8, msg: select '\\' || ?, errcode: 0, error: ERROR: syntax error at end of input
LINE 1: select '\\' || ?
^
Posted: 24.07.2012, 13:15
by olehs
EgonHugeist,
How about reverting to old MySQL implementaion just for PG?
Code: Select all
{**
Return a quoted string token from a reader. This method
will collect characters until it sees a match to the
character that the tokenizer used to switch to this state.
@return a quoted string token from a reader
}
function TZPostgreSQLQuoteState.NextToken(Stream: TStream;
FirstChar: Char; Tokenizer: TZTokenizer): TZToken;
const BackSlash = Char('\');
var
ReadChar: Char;
LastChar: Char;
QuoteChar: Char;
QuoteCount: Integer;
begin
Result.Value := FirstChar;
QuoteCount := 1;
If FirstChar = '`' then
Result.TokenType := ttQuotedIdentifier
Else
Result.TokenType := ttQuoted;
QuoteChar := FirstChar;
LastChar := #0;
while Stream.Read(ReadChar, SizeOf(Char)) > 0 do
begin
if ReadChar = QuoteChar then Inc(QuoteCount);
if (LastChar = FirstChar) and (ReadChar <> FirstChar) then
begin
if QuoteCount mod 2 = 0 then
begin
Stream.Seek(-SizeOf(Char), soFromCurrent);
Break;
end;
end;
Result.Value := Result.Value + ReadChar;
if LastChar = BackSlash then
LastChar := #0
else if (LastChar = FirstChar) and (ReadChar = FirstChar) then
LastChar := #0
else LastChar := ReadChar;
end;
end;
Posted: 24.07.2012, 15:05
by EgonHugeist
olehs,
id did apply your bug patch. Rev1578
How about reverting to old MySQL implementaion just for PG?
[s]No problem does it fix the issue?
[/s]
Both proposals tested. No issues to see. Patch applyed Rev, 1579
Michael
Posted: 24.07.2012, 15:24
by olehs
EgonHugeist,
Still testing, but seems to work fine.
And what about TZPostgreSQLConnection.GetEscapeString? A query like
Code: Select all
q.SQL.Text := 'select * from sys_values where sysval_name = :sn';
q.ParamByName('sn').AsString := 'QMenu\QUICKMENUROWCOUNT';
And I get final query like
Code: Select all
select * from sys_values where sysval_name = 'QMenu\QUICKMENUROWCOUNT'
when SCS=off
and
Code: Select all
select * from sys_values where sysval_name = 'QMenu\134QUICKMENUROWCOUNT'
when SCS=on
which is wrong in both cases