[patch_done] Strings like E'\\' are incorrectly tokenized
Moderators: gto, EgonHugeist, olehs
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
did now check your patch. Great work again. I like that ZUrl.OnPropertiesChange idea.
Question: Now you added CheckEscapeSyntax(). As far as i can see do you check now only for 'E'. If i did understand you right in the past then a complete String like E'abcd\\' should be detectable or not? A quoted state will only be detactable if '/"/´ is used...
Can you explain the purpose to a postgre dummy a little bit more detailed?
Patch done Rev1648.
Michael
did now check your patch. Great work again. I like that ZUrl.OnPropertiesChange idea.
Question: Now you added CheckEscapeSyntax(). As far as i can see do you check now only for 'E'. If i did understand you right in the past then a complete String like E'abcd\\' should be detectable or not? A quoted state will only be detactable if '/"/´ is used...
Can you explain the purpose to a postgre dummy a little bit more detailed?
Patch done Rev1648.
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
this is right and i completely understand this. Where i have my problems to understand is that the detection of E'\sf\' still returns ttWord, ttQuoted (or am i wrong?). Thought a complete ttQuoted should be the better result ?? Anyway you did a great job.
Michael
this is right and i completely understand this. Where i have my problems to understand is that the detection of E'\sf\' still returns ttWord, ttQuoted (or am i wrong?). Thought a complete ttQuoted should be the better result ?? Anyway you did a great job.
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
that's what i try to propose in the past. The one and only possibility i see is to upgrade the ttWord detection. E is a ttWordState. But the ttWord detection breaks if CharInSet[A..Z, a..z, 0..9] is invalid. What if we add a test for Quotes before leaving and making a TProcedure like TttQuotedNextToken = procedure(): TZToken; then we can add the same procedure to the ttWord AND ttQuoted detection...
Ohlehs the E'asfsf' is just a simple start you proposed to have. But there are some more like these: 0x'de23ef...'(ODBC-binary) or x'af0d....'(sql-standart binary) as binary (for all plains who do understand this syntax).
If such spezials are detectable then we can think about some extra states like ttBinary, if that makes sence...
What do yo think?
Michael
that's what i try to propose in the past. The one and only possibility i see is to upgrade the ttWord detection. E is a ttWordState. But the ttWord detection breaks if CharInSet[A..Z, a..z, 0..9] is invalid. What if we add a test for Quotes before leaving and making a TProcedure like TttQuotedNextToken = procedure(): TZToken; then we can add the same procedure to the ttWord AND ttQuoted detection...
Ohlehs the E'asfsf' is just a simple start you proposed to have. But there are some more like these: 0x'de23ef...'(ODBC-binary) or x'af0d....'(sql-standart binary) as binary (for all plains who do understand this syntax).
If such spezials are detectable then we can think about some extra states like ttBinary, if that makes sence...
What do yo think?
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
seems like you confirm my thoughts. The point why i think it can be helpfull is to avoid Encoding checks or something like this. (Ideas are welcome)
i propose you start with postgre for the bytea \x.... and so on (sysmbol-state upgrade?)
and i'll start with the x''/0x'' for the other Drivers. Also do i propose to make this detection optional. (Nobody missed it be for ): Only me!)
Much more interesting for me would be _UTF8() string cast detections...
What do you think, olehs?
Michael
seems like you confirm my thoughts. The point why i think it can be helpfull is to avoid Encoding checks or something like this. (Ideas are welcome)
i propose you start with postgre for the bytea \x.... and so on (sysmbol-state upgrade?)
and i'll start with the x''/0x'' for the other Drivers. Also do i propose to make this detection optional. (Nobody missed it be for ): Only me!)
Much more interesting for me would be _UTF8() string cast detections...
What do you think, olehs?
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
EgonHugeist,
I guess nobody missed that detection before because it doesn't affect resulting statement, so there is no need to make it optional (at least someone uses tokinizer directly).Nobody missed it be for
How do you think will be the best? Do we have to allow one state to replace itself with another (SymbolState->QuotedState)?i'll start with the x''/0x'' for the other Drivers
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
- keeping the generic CharacterStates like they are (and after you latest patches).
- creating child detections of a state which the tokens do use first: \x0123af (PostgreSQL Binary) should be a ttSymbolState first. Also the octal and all others you've proposed before. So i'm not thinking about creating new TZToken objects. I'm only thinking about how to implement such child detections IF the TZPostgreSQLSymbolState(and all others).NextToken would be prepared for this.
Then the result of the detection could be a new one which we decide like:
Token.Value='_UTF8'/Token.Type=ttCharacterSet or Token.Value='x'12afed45'/Token.Type = ttBinary.
Just have a look to the ttKeyWord detection or the number-states(for integers/Hex... etc.)
Here i see generally no disadvantages if we make a fine tuning if the sub-detections. But this is now only my thoughts. Other/better porposals, olehs?
Michael
Agree. And i can't see disadvantages or issues to support such detections. A token detection E as ttWord + '\\ee\'as ttQuoted or x as ttWord '15af2b' as ttQuoted is (in my mind) useless except the function knows about issues like these and reassambles these Tokens again (IZSelectSchema, ZExpression for example).I guess nobody missed that detection before because it doesn't affect resulting statement, so there is no need to make it optional (at least someone uses tokinizer directly).
Hmm i think the best would be:How do you think will be the best? Do we have to allow one state to replace itself with another (SymbolState->QuotedState)?
- keeping the generic CharacterStates like they are (and after you latest patches).
- creating child detections of a state which the tokens do use first: \x0123af (PostgreSQL Binary) should be a ttSymbolState first. Also the octal and all others you've proposed before. So i'm not thinking about creating new TZToken objects. I'm only thinking about how to implement such child detections IF the TZPostgreSQLSymbolState(and all others).NextToken would be prepared for this.
Then the result of the detection could be a new one which we decide like:
Token.Value='_UTF8'/Token.Type=ttCharacterSet or Token.Value='x'12afed45'/Token.Type = ttBinary.
Just have a look to the ttKeyWord detection or the number-states(for integers/Hex... etc.)
Here i see generally no disadvantages if we make a fine tuning if the sub-detections. But this is now only my thoughts. Other/better porposals, olehs?
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
EgonHugeist,
as far as I know, there is only 4 types of string "modifiers" in PG:
E'abcd\n' - escaped strings
B'1001' - bit-strings
X'1FF' - hex-stings
U&'abc\0422' - strings with Unicode escapes
and all of them are just quoted strings.
So if I understand you right, TZPostgreSQLWordState will have to detect these sequences and call TZPostgreSQLQuoteState.NextToken? But in this case TZPostgreSQLQuoteState.NextToken will have to look back anyway to check the E-syntax.
And what do you think of string like
U&'d!0061t!+000061' UESCAPE '!'
Does it have to be just a single ttQuote too?
as far as I know, there is only 4 types of string "modifiers" in PG:
E'abcd\n' - escaped strings
B'1001' - bit-strings
X'1FF' - hex-stings
U&'abc\0422' - strings with Unicode escapes
and all of them are just quoted strings.
So if I understand you right, TZPostgreSQLWordState will have to detect these sequences and call TZPostgreSQLQuoteState.NextToken? But in this case TZPostgreSQLQuoteState.NextToken will have to look back anyway to check the E-syntax.
And what do you think of string like
U&'d!0061t!+000061' UESCAPE '!'
Does it have to be just a single ttQuote too?
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
'E' was detected as ttWord. Then the next char is the #39(') and TZPostgreSQLQuoteState.NextToken breaks here. But if TZPostgreSQLQuoteState.NextToken knows about that 'E','B','X','U&' can be followed from a ttQuoted then we go on reading and change the result-state to an optional type...
The qestion for me is for what do you think we can use such detections?
I'm trying to explain what i want to do:
My PreprepareSQL was an idea to avoid double check's of still right encoded and escaped String from our Parameter-Queries.
Now we start from the premise we use an Ansi-Compiler.
TZQuery.SQL.Text := 'select * from Table1 where Field1 = :F1 and Field2 = 'Müller'.
If the connection was opened in UTF8 then Zeos takes care the the Parameter F1 is allway right encoded as UTF8String. But the Query will fail because the 'ü' of 'Müller' is not UTF8Encoded. Simply that was the reason for splitting the queries into tokens to avoid partial wrong encoded values. For the UnicodeIDE's this is not the case here, this has onother planned purpose. So i set the escapeMarks around the F1 Parameter-Value to avoid speed decrease of the function ZPlainString which calls DetectUTF8Encoding('Müller') first and decides what happens to the partial string value.
Now we send also binary data as strings to the Server and checking this token first. Which means we read the whole token once and then again to check the encoding. IF we know the next token is ttBinary or an Character-State then we do not need to check this again..
On the other hand it could be helpfull for everybody to have such a detection, or not? You are one of the guys who know obout the possibilities we have and others too.
Btw. the EncodeBytea function returns a string like '\x34dfa2' as binary encoded strings... With a starting backslash and no quotes. Does postgre understand the x'edfaee2455' syntax too?
Michael
Yes this is what i've tryed to say in the past too. In my mind should we export the TZPostgreSQLQuoteState.NextToken function to be able to call the same from different states like TZPostgreSQLQuoteState IF (example)So if I understand you right, TZPostgreSQLWordState will have to detect these sequences and call TZPostgreSQLQuoteState.NextToken?
'E' was detected as ttWord. Then the next char is the #39(') and TZPostgreSQLQuoteState.NextToken breaks here. But if TZPostgreSQLQuoteState.NextToken knows about that 'E','B','X','U&' can be followed from a ttQuoted then we go on reading and change the result-state to an optional type...
Yes this is the issue we have now. I think this look back to the 'E' is not really needed IF you agree with my proposal. The result of my proposal is either a complete ttQuoted/ttEscapeQuoted(which we can introduce here) or ttBinary or ttBit and so on..But in this case TZPostgreSQLQuoteState.NextToken will have to look back anyway to check the E-syntax.
That's a good point i didn't know about before. I don't think a complete ttQuoted would be helpfull here or becomes to complicatate. You are the PostgreSQL hero here, not me.And what do you think of string like
U&'d!0061t!+000061' UESCAPE '!'
Does it have to be just a single ttQuote too?
The qestion for me is for what do you think we can use such detections?
I'm trying to explain what i want to do:
My PreprepareSQL was an idea to avoid double check's of still right encoded and escaped String from our Parameter-Queries.
Now we start from the premise we use an Ansi-Compiler.
TZQuery.SQL.Text := 'select * from Table1 where Field1 = :F1 and Field2 = 'Müller'.
If the connection was opened in UTF8 then Zeos takes care the the Parameter F1 is allway right encoded as UTF8String. But the Query will fail because the 'ü' of 'Müller' is not UTF8Encoded. Simply that was the reason for splitting the queries into tokens to avoid partial wrong encoded values. For the UnicodeIDE's this is not the case here, this has onother planned purpose. So i set the escapeMarks around the F1 Parameter-Value to avoid speed decrease of the function ZPlainString which calls DetectUTF8Encoding('Müller') first and decides what happens to the partial string value.
Now we send also binary data as strings to the Server and checking this token first. Which means we read the whole token once and then again to check the encoding. IF we know the next token is ttBinary or an Character-State then we do not need to check this again..
On the other hand it could be helpfull for everybody to have such a detection, or not? You are one of the guys who know obout the possibilities we have and others too.
Btw. the EncodeBytea function returns a string like '\x34dfa2' as binary encoded strings... With a starting backslash and no quotes. Does postgre understand the x'edfaee2455' syntax too?
Michael
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
EgonHugeist,
I don't really know what to say... This seems like a big challenge. There can be so many issues like the one I patched at rev.1651. There was a query like and TWordState couldn't detect non-latin alias. And now according to what you said, ttWord has to be checked for Encoding. And there may be surprises also in DDL statements and so on.
I think as a developer that uses Zeos, for me is better to manually control that encoding of SQL and in-parameters were the same.
select x'edfaee2455'
returns 1110110111111010111011100010010001010101
and
select x'edfaee2455'::int8
returns 1022117160021
Ok. I think this looks good and can be done.'E' was detected as ttWord. Then the next char is the #39(') and TZPostgreSQLQuoteState.NextToken breaks here. But if TZPostgreSQLQuoteState.NextToken knows about that 'E','B','X','U&' can be followed from a ttQuoted then we go on reading and change the result-state to an optional type...
Oh... now I understood the whole idea you put into PreprepareSQL processing.I'm trying to explain what i want to do:
My PreprepareSQL was an idea to avoid double check's of still right encoded and escaped String from our Parameter-Queries.
I don't really know what to say... This seems like a big challenge. There can be so many issues like the one I patched at rev.1651. There was a query like
Code: Select all
select id as Код from orders
I think as a developer that uses Zeos, for me is better to manually control that encoding of SQL and in-parameters were the same.
PostgreSQL understands x'edfaee2455' as a bit string constant (and has nothing to BYTEA as I know). For exampleBtw. the EncodeBytea function returns a string like '\x34dfa2' as binary encoded strings... With a starting backslash and no quotes. Does postgre understand the x'edfaee2455' syntax too?
select x'edfaee2455'
returns 1110110111111010111011100010010001010101
and
select x'edfaee2455'::int8
returns 1022117160021
- EgonHugeist
- Zeos Project Manager
- Posts: 1936
- Joined: 31.03.2011, 22:38
olehs,
had a look now. Not my proposal but it seems to handle really everything we've talked about... Great job again. Any suggestions about different token types than ttQuoted?
We can add an option enumeration type like toAcceptModifiers and on TZTokenizer.TokenizeStreamToList we can check for this option. If this option is not set we reset the Token.Type(ttBinary) back to ttQuoted..
What do you think?
Commit your patch, olehs.
Michael
Btw. Actually i'm trying to complete the PostgreSQL RealPrepared-Statements (not ready yet, have some issues left. If you want to test them then exchange the TZPostgreSQLEmulatedPreparedStatement with TZPostgreSQLPreparedStatement on line 576 ZDbcPostgreSql.pas)
had a look now. Not my proposal but it seems to handle really everything we've talked about... Great job again. Any suggestions about different token types than ttQuoted?
We can add an option enumeration type like toAcceptModifiers and on TZTokenizer.TokenizeStreamToList we can check for this option. If this option is not set we reset the Token.Type(ttBinary) back to ttQuoted..
What do you think?
Commit your patch, olehs.
Michael
Btw. Actually i'm trying to complete the PostgreSQL RealPrepared-Statements (not ready yet, have some issues left. If you want to test them then exchange the TZPostgreSQLEmulatedPreparedStatement with TZPostgreSQLPreparedStatement on line 576 ZDbcPostgreSql.pas)
Best regards, Michael
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
You want to help? http://zeoslib.sourceforge.net/viewtopic.php?f=4&t=3671
You found a (possible) bug? Use the new bugtracker dude! http://sourceforge.net/p/zeoslib/tickets/
EgonHugeist,
I also think there is no need for extra token type in PostgreSQL. ttQuote (and maybe ttEscapedQuoted) is quite enough.
I absolutely don't mind . Couldn't do it myself because I'm not familiar with other databases' specific syntax.We can add an option enumeration type like toAcceptModifiers and on TZTokenizer.TokenizeStreamToList we can check for this option. If this option is not set we reset the Token.Type(ttBinary) back to ttQuoted..
What do you think?
I also think there is no need for extra token type in PostgreSQL. ttQuote (and maybe ttEscapedQuoted) is quite enough.
Yes I saw it and think this is just a great job you are doing. If you need any help I will be glad to help you.Btw. Actually i'm trying to complete the PostgreSQL RealPrepared-Statements (not ready yet, have some issues left. If you want to test them then exchange the TZPostgreSQLEmulatedPreparedStatement with TZPostgreSQLPreparedStatement on line 576 ZDbcPostgreSql.pas)