[patch_done] Strings like E'\\' are incorrectly tokenized
Moderators: gto, EgonHugeist, olehs
[patch_done] Strings like E'\\' are incorrectly tokenized
I guess error is in TZMySQLQuoteState.NextToken
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
now you make me curious again.
In the description you wrote PostgreSQL. standard_conforming_strings=off which points me to the Postgre Tokenizer.
But here you're writing about the TZMySQLQuoteState.
In my mind your E'\\' example should be ttWord, ttQuoted.
Can you explain me better what exactly you mean?
Michael
now you make me curious again.
In the description you wrote PostgreSQL. standard_conforming_strings=off which points me to the Postgre Tokenizer.
But here you're writing about the TZMySQLQuoteState.
In my mind your E'\\' example should be ttWord, ttQuoted.
Can you explain me better what exactly you mean?
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
EgonHugeist,
didn't want to confuse you.
Postgre tokenizer uses inherited (MySql) tokenizer.
Look at this part of TZMySQLQuoteState.NextToken
When (ReadChar = BackSlash) and (TempLastChar = BackSlash) condition is met,
one need to reset TempLastChar:
didn't want to confuse you.
Postgre tokenizer uses inherited (MySql) tokenizer.
Look at this part of TZMySQLQuoteState.NextToken
Code: Select all
end;
end;
end else
if (ReadChar = BackSlash) and (TempLastChar = BackSlash) then
Result.TokenType := ttEscapedQuoted
end
else
if ReadChar = BackSlash then
TempLastChar := ReadChar
else
TempLastChar := #0;
one need to reset TempLastChar:
Code: Select all
end;
end;
end else
if (ReadChar = BackSlash) and (TempLastChar = BackSlash) then
begin
Result.TokenType := ttEscapedQuoted;
TempLastChar := #0;
end
else
if ReadChar = BackSlash then
TempLastChar := ReadChar
else
TempLastChar := #0;
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
hum this is correct now again! I didn't expect or saw that th Postgre Tokenizer is descendant of the MySQL Tokenizer!
It was me who chaged this sequence. The reason was a simple one. I'm trying to explain:
MySQL and several other Drivers do need spezial escaped chars like ' or /.
As i introduced my PreprepareSQL functionallity i wanted not only have a proper encoding no i wanted also to escape the strings like the engiense stupidly do expect them. Which means for MySQL: you need only to use AnsiQuotedStr/QuotedStr. So i chaged the sequence in a kind to detect still escaped or unescaped chars.
This behavior i want to make possible for all Databases where it makes sence (MsSQL excluded -> no common charset).
Our testsuites do not report me any issues. So i need better tests.
Can you give me some additional examples for postgre which i can add as tests and what do we have to expect as token results? That would be nice.
Michael
hum this is correct now again! I didn't expect or saw that th Postgre Tokenizer is descendant of the MySQL Tokenizer!
It was me who chaged this sequence. The reason was a simple one. I'm trying to explain:
MySQL and several other Drivers do need spezial escaped chars like ' or /.
As i introduced my PreprepareSQL functionallity i wanted not only have a proper encoding no i wanted also to escape the strings like the engiense stupidly do expect them. Which means for MySQL: you need only to use AnsiQuotedStr/QuotedStr. So i chaged the sequence in a kind to detect still escaped or unescaped chars.
This behavior i want to make possible for all Databases where it makes sence (MsSQL excluded -> no common charset).
Our testsuites do not report me any issues. So i need better tests.
Can you give me some additional examples for postgre which i can add as tests and what do we have to expect as token results? That would be nice.
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
Well, it's not that easy as I thought.
Many things in tokenizing PostreSQL strings depend on standard_conforming_strings parameter.
So I'll have to give samples for both cases.
standard_conforming_strings=on
In this case backslashes are treated just as they are:
'\\' is treated as \\
'\n' = \n
'\' =\
But one can explicitly use 'Escape syntax' by adding E, then
E'\\' is treated as \
E'\n' is a newline etc.
The full list is in the docs:
But when standard_conforming_strings=off you don't even have to use E-syntax - all backslashes are treated as escape characters.
'\\' is treated as \
'\n' is a newline,
E'\n' is also a newline,
'\z' is treated as z, because it's not valid escape sequence and escape char is skipped
Many things in tokenizing PostreSQL strings depend on standard_conforming_strings parameter.
So I'll have to give samples for both cases.
standard_conforming_strings=on
In this case backslashes are treated just as they are:
'\\' is treated as \\
'\n' = \n
'\' =\
But one can explicitly use 'Escape syntax' by adding E, then
E'\\' is treated as \
E'\n' is a newline etc.
The full list is in the docs:
Code: Select all
\b backspace
\f form feed
\n newline
\r carriage return
\t tab
\o, \oo, \ooo (o = 0 - 7) octal byte value
\xh, \xhh (h = 0 - 9, A - F) hexadecimal byte value
'\\' is treated as \
'\n' is a newline,
E'\n' is also a newline,
'\z' is treated as z, because it's not valid escape sequence and escape char is skipped
Last edited by olehs on 23.07.2012, 23:34, edited 1 time in total.
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
Hmpff that doesn't make me happy. Zeos was never able to detect such stupid things. Terrible! Don't know why they make such stange things and actually i thought MySQL is the goal runner there: Try to insert 5 chars into a varchar(5) field,lol, it only accepts 4! Best and cleared API from developer side: Oracle and Firebird. Also does SQLite follow simply the AnsiQuotedStr rules which is safe.
However, with this stuff i can work and build some test cases. Not the ttQuoted detection need an upgrade NO the ttWord state has to check this and has to fall into ttEscapeQuoted in these cases.
Another thing: You do work with a fullunicode IDE. Now tell me what i've to do with still escaped tokens? Do i have to UTF8encode/cast to ansi(in case of the client encoding) or shold i start from the premise you aleady did this before?
Michael
Hmpff that doesn't make me happy. Zeos was never able to detect such stupid things. Terrible! Don't know why they make such stange things and actually i thought MySQL is the goal runner there: Try to insert 5 chars into a varchar(5) field,lol, it only accepts 4! Best and cleared API from developer side: Oracle and Firebird. Also does SQLite follow simply the AnsiQuotedStr rules which is safe.
However, with this stuff i can work and build some test cases. Not the ttQuoted detection need an upgrade NO the ttWord state has to check this and has to fall into ttEscapeQuoted in these cases.
Another thing: You do work with a fullunicode IDE. Now tell me what i've to do with still escaped tokens? Do i have to UTF8encode/cast to ansi(in case of the client encoding) or shold i start from the premise you aleady did this before?
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
EgonHugeist,
I didn't actually get your question about fullunicode IDEs. If you are talking about Preprepare property - it's probably not needed in there, because
Strings are always unicode. So you always have to cast/encode it to ClientEncoding without escaping. Such strings can even be mixed with escapes like '\134'. In this case behavior depends on standard_conforming_strings parameter but it's server's duty to deal with them.
Here is my vision for this sample code
if standard_conforming_strings=on the final query should be in client encoding and should look like
if standard_conforming_strings=off the final query should also be in client encoding and should look like
So there is no need to guess if the string is already escaped, you just always have to escape quotes in String params and backslashes when SCS=off.
Maybe it would be better to have a query parameter, somehing like
escape_strings = {off, on, auto, always}
off - strings w/o E, backslashes - as they are
on - strings with E, backslashes - as they are
always - strings with E, backslashes are doubled
auto - when SCS=on - same as off, SCS=off - same as always
Default value could by auto
I didn't actually get your question about fullunicode IDEs. If you are talking about Preprepare property - it's probably not needed in there, because
Strings are always unicode. So you always have to cast/encode it to ClientEncoding without escaping. Such strings can even be mixed with escapes like '\134'. In this case behavior depends on standard_conforming_strings parameter but it's server's duty to deal with them.
Here is my vision for this sample code
Code: Select all
ZQuery1.SQL.Text := 'select description || '' - тест\134'' || E''\\t'' || ''\n'' from reports where title = :title;';
ZQuery1.ParamByName('title').AsString := 'Common\Отчет'#9'1';
Code: Select all
select description || ' - тест\134' || E'\\t' || '\n' from reports where title = 'Common\Отчет 1';
Code: Select all
select description || ' - тест\134' || E'\\t' || '\n' from reports where title = E'Common\\Отчет 1';
Maybe it would be better to have a query parameter, somehing like
escape_strings = {off, on, auto, always}
off - strings w/o E, backslashes - as they are
on - strings with E, backslashes - as they are
always - strings with E, backslashes are doubled
auto - when SCS=on - same as off, SCS=off - same as always
Default value could by auto
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
slightly this is getting tooo complicated i think.
I changed only the ttQuoted Token detection for Postgre similar to your first proposal.
Also i wont have the time to implement the Token issues Postgre has. I'll add a test with your "vision" code later on.
Patch done Rev. 1576
Can you check the issue again?
Michael
slightly this is getting tooo complicated i think.
I changed only the ttQuoted Token detection for Postgre similar to your first proposal.
Also i wont have the time to implement the Token issues Postgre has. I'll add a test with your "vision" code later on.
Patch done Rev. 1576
Can you check the issue again?
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
EgonHugeist,
checked rev.1576. Tried it with testcase
select '\\' || :str
and got
checked rev.1576. Tried it with testcase
select '\\' || :str
and got
Code: Select all
2012-07-24 14:55:01 cat: Execute, proto: postgresql-8, msg: select '\\' || ?, errcode: 0, error: ERROR: syntax error at end of input
LINE 1: select '\\' || ?
^
EgonHugeist,
How about reverting to old MySQL implementaion just for PG?
How about reverting to old MySQL implementaion just for PG?
Code: Select all
{**
Return a quoted string token from a reader. This method
will collect characters until it sees a match to the
character that the tokenizer used to switch to this state.
@return a quoted string token from a reader
}
function TZPostgreSQLQuoteState.NextToken(Stream: TStream;
FirstChar: Char; Tokenizer: TZTokenizer): TZToken;
const BackSlash = Char('\');
var
ReadChar: Char;
LastChar: Char;
QuoteChar: Char;
QuoteCount: Integer;
begin
Result.Value := FirstChar;
QuoteCount := 1;
If FirstChar = '`' then
Result.TokenType := ttQuotedIdentifier
Else
Result.TokenType := ttQuoted;
QuoteChar := FirstChar;
LastChar := #0;
while Stream.Read(ReadChar, SizeOf(Char)) > 0 do
begin
if ReadChar = QuoteChar then Inc(QuoteCount);
if (LastChar = FirstChar) and (ReadChar <> FirstChar) then
begin
if QuoteCount mod 2 = 0 then
begin
Stream.Seek(-SizeOf(Char), soFromCurrent);
Break;
end;
end;
Result.Value := Result.Value + ReadChar;
if LastChar = BackSlash then
LastChar := #0
else if (LastChar = FirstChar) and (ReadChar = FirstChar) then
LastChar := #0
else LastChar := ReadChar;
end;
end;
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
id did apply your bug patch. Rev1578
[/s]
Both proposals tested. No issues to see. Patch applyed Rev, 1579
Michael
id did apply your bug patch. Rev1578
[s]No problem does it fix the issue?How about reverting to old MySQL implementaion just for PG?
[/s]
Both proposals tested. No issues to see. Patch applyed Rev, 1579
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
EgonHugeist,
Still testing, but seems to work fine.
And what about TZPostgreSQLConnection.GetEscapeString? A query like
And I get final query like
when SCS=off
and when SCS=on
which is wrong in both cases
Still testing, but seems to work fine.
And what about TZPostgreSQLConnection.GetEscapeString? A query like
Code: Select all
q.SQL.Text := 'select * from sys_values where sysval_name = :sn';
q.ParamByName('sn').AsString := 'QMenu\QUICKMENUROWCOUNT';
Code: Select all
select * from sys_values where sysval_name = 'QMenu\QUICKMENUROWCOUNT'
and
Code: Select all
select * from sys_values where sysval_name = 'QMenu\134QUICKMENUROWCOUNT'
which is wrong in both cases