Page 1 of 1

MySQl Metadata consistency

Posted: 06.12.2020, 22:02
by MJFShark
In the Metadata calls for tables and procedures there's an inconsistency in how catalog and schema are handled:

Currently for Tables, Columns, TablePrivs, ColumnPrivs, Pks, Indexes:
TABLE_CAT = Database
TABLE_SCHEMA = Empty

Currently for Procedures, ProcedureColumns:
PROCEDURE_CAT = 'def' (the value in MySQL's catalog info schema tables)
PROCEDURE_SCHEMA = database

Currently for CollationsAndCharSets:
COLLATION_CATALOG = database
COLLATION_SCHEMA = database

Most of the newer (I think) changes to the metadata calls allow passing the database in either the Catalog or Schema parameters. So my question in all of this is, should this be changed at all? Will MySQL add real catalogs at some point? I must admit that I think of the MySQL databases as schemas, but I know that some of this is from older code and also that jdbc has it's own definitions/standards. I obviously don't want to break any backwards compatibility. Perhaps returning database in both catalog and schema is a good compromise for now? Thanks for any feedback!

-Mark

Re: MySQl Metadata consistency

Posted: 06.12.2020, 22:25
by aehimself
I am no expert in SQL. The phrases like "schema", "catalog" and "database" mentioned in the same sentence scares me.

I HAVE NO DAMN CLUE WHAT IS THE DIFFERENCE!!!! :D

Re: MySQl Metadata consistency

Posted: 07.12.2020, 04:53
by MJFShark
The terms are a bit different depending on the database you use, however the Zeos metadata layer works with the terms (as parameters and object fields) "Catalog" and "Schema", so it's just a matter of deciding how these map to the underlying data dictionary. I usually think of SCHEMA as the direct table owner so a "full" table name for me is often schema.tablename. To me, the current MySQL metadata layer is a bit odd in that it uses Catalog to map to "database" (mostly) whereas I would think that Zeos "schema" would map to MySQL "database".

But that's really an aside, I don't think it matters what gets mapped to what, as long as it's consistent, at least within a protocol. If backwards compatibility wasn't a concern, I'd map MySQL database to ZeosLib schema, and have catalog be "def" or blank, BUT there may be very good reasons for it to work the way it does that I don't know about. It might make sense to return database for both in the metadata results.

Btw, a good summary of the terms of the various databases is:
https://stackoverflow.com/a/7944489/113128
And it also gives an example of how to identify an object in each one, which is very interesting! I'm most used to Oracle (which uses schema.objectname.)

-Mark

Re: MySQl Metadata consistency

Posted: 07.12.2020, 09:57
by aehimself
Jesus. I remember we slightly touched this topic with Jan once, but I see it now that the confusion is bigger than I expected.
So nice to have terms if they are used inconsistently.

Re: MySQl Metadata consistency

Posted: 07.12.2020, 14:22
by MJFShark
The catalog vs schema thing isn't too bad I think. The real nightmare is case sensitivity in MySQL since it depends on the OS filename case sensitivity and the lower_case_table_names variable setting. It's just brutal.

-Mark

Re: MySQl Metadata consistency

Posted: 08.12.2020, 11:21
by marsupilami
aehimself wrote: 07.12.2020, 09:57 Jesus. I remember we slightly touched this topic with Jan once, but I see it now that the confusion is bigger than I expected.
So nice to have terms if they are used inconsistently.
I think you are right ;o) I like to use that SO answer for reference pourposes too. For Zeos I would like to go the following way:
Catalog: A single database, that is independent from other databases on the same system. Cross datbase Queries don't need to be possible.
Schema: A namesapace within a database, that can be used to distinguish objects that otherwise have the same name. Cross Schema queries usually should be possible.

So when it comes to Oracle, I think they usually have only one database and in that database they have different schemas. I am not sure, how pluggable databases on Oracle fit in here. I assume that MySQL and MariaDB go the same way. A MySQL database could / should be considered to be a schema. I think when querying their INFORMATION_SCHEMA objects they do it that way?

On PostgreSQL the Zeos catalog is the database. A Zeos schemas would be the Schema objects they have - as well as their catalogs (information_schema and pg_catalog).

On Firebird the database file would be the catalog. Schemas are not existant on Firebird (yet).

Does that make sense?

Unfortunately Zeos itself isn't too strict on these terms because when we connect to a database the TZConnection object has a property "Catalog" - which essentially would be the Schema in the above definition. Maybe we should change that in a future version?

Re: MySQl Metadata consistency

Posted: 08.12.2020, 14:41
by MJFShark
Agreed on all points! I have an update ready to go for MySQLMetaData which has several fixes and improvements, but I'm just wondering about what to return in the catalog and schema fields of tables. The current one returns database in CATALOG and nothing in Schema. It's inconsistent with the other metadata types. So the options are:

#1: Return "database" in CATALOG and nothing in SCHEMA
#2: Return "database" in SCHEMA and nothing in CATALOG
#3: Return "database" in both.

I don't love #1 but it keeps the behavior exactly the same as existing. #2 is more logical and it makes sense to convert the whole MySQL Metadata to use it at some point in the future. #3 is a compromise that works all around, and is used in some of the MySQL metadata calls now.

Being new to the project I'm not exactly sure how decisions like this are made, so feedback is appreciated!

-Mark

Re: MySQl Metadata consistency

Posted: 08.12.2020, 15:52
by aehimself
So far I pushed my changes and if Michael didn't like something he changed it :D

Re: MySQl Metadata consistency

Posted: 08.12.2020, 17:41
by EgonHugeist
@Mark,

do you know this: https://dev.mysql.com/worklog/task/?id=942 means it's technicaly possible MySQL will change the "SCHEMA"/"CATALOG" purpose. There are more than 1 ticket for the MySQL interpretion of what a schema is...
MJFShark wrote: 08.12.2020, 14:41 #1: Return "database" in CATALOG and nothing in SCHEMA
#2: Return "database" in SCHEMA and nothing in CATALOG
#3: Return "database" in both.
I vote for #1:
Even if it's not what MySQL documents, it technicaly correct as Jan did explain already.
MJFShark wrote: 06.12.2020, 22:02 Currently for Procedures, ProcedureColumns:
PROCEDURE_CAT = 'def' (the value in MySQL's catalog info schema tables)
PROCEDURE_SCHEMA = database

Currently for CollationsAndCharSets:
COLLATION_CATALOG = database
COLLATION_SCHEMA = database
Bugs -> let's fix it Felix jr.!
Don't hassitate to provide patches. If i know you're woking on ... i won't touch the keyboard.

Re: MySQl Metadata consistency

Posted: 08.12.2020, 19:31
by MJFShark
Hi Michael,

If I read that document correctly they'll be adding a "catalog" layer above the existing MySQL "database" layer (which they've already done in the information_schema.) So you'll be able to refer to a table using catalog.database.tablename whereas now it's database.tablename. Note that in the current MySQL Information_schema they map catalog to 'def' and schema to "database". In my estimation that makes my option #2 or #3 the correct choice for future proofing. In my mind the zeoslib notion should always be catalog.schema.tablename. Am I correct that only the MySQL protocol currently goes catalog.tablename? As mentioned in Jan's post catalog is always the "top" layer and is generally optional (I don't handle it at all in any of my code, but for MySQL I need to remember that "catalog" is what's used for what I would normal map to "schema".)

-Mark
P.S.: The terms, especially the term "database" which has a number of meanings can be a problem in general. Above when I say database I'm talking about the MySQL concept of database, which they are current mapping to the SCHEMA field in INFORMATION_SCHEMA. I tend to just think about layers. Firebird had no layers, MySQL has one layer (currently), Oracle and PostgreSQL have two or even more layers. It's just a matter of the layer we map to Zeos's catalog and schema terms per protocol.

Re: MySQl Metadata consistency

Posted: 09.12.2020, 11:41
by marsupilami
Sorry, I also support option number two. MySQL / MariaDB "databases" behave more like schemas in my above definition. Also we already did a breaking change that went into the direction of option #2. This is from the Zeos 7.2.6 release notes:
Until Zeos 7.2.4 when querying for procedure columns (parameters) the database name was returned in the PROCEDURE_CAT column. PROCEDURE_CAT now always returns ”def” while the database is returned in the PROCEDURE_SCHEM column.
I seem to remember that I changed this for supporting MySQL 8.0...

MJFShark wrote: 08.12.2020, 19:31 As mentioned in Jan's post catalog is always the "top" layer and is generally optional (I don't handle it at all in any of my code, but for MySQL I need to remember that "catalog" is what's used for what I would normal map to "schema".)
It might seem a bit picky but for me there always has to be a catalog. In PostgreSQL the catalog is the database name. The same goes for Firebird. For me also Oracle and MySQL do have a catalog. But its name is irrelevant because they have exacly one catalog and cannot have more than that one catalog. So having a catalog is not optional. But having a name for that catalog is optional.
MJFShark wrote: 08.12.2020, 19:31Firebird had no layers, MySQL has one layer (currently), Oracle and PostgreSQL have two or even more layers. It's just a matter of the layer we map to Zeos's catalog and schema terms per protocol.
For me Firebird also has layers - the database name. It is just not possible to do cross queries between them ;) How does Oracle have more than one layer? Do they support something else besides having schemas in their databases? I don't know Oracle that much, so I am eager to learn more there :)

Re: MySQl Metadata consistency

Posted: 09.12.2020, 14:23
by MJFShark
Onions! Onions have layers! Yes, just one layer for Oracle! I'm not sure what I was thinking there. I don't think of Firebird database as a layer because I don't use them in dot notation when referring to objects (the dot notation the database uses for referring to a table is my (possibly goofy) "layer" analogy.) And I absolutely was not suggesting removing the catalog concept from Zeoslib, just that I don't use it for any of my stuff.. I assume in the db it will "default" to something that makes sense if it's needed.

Really this particular catalog and schema return choice is just two lines of code, so it's quite easy to go with any of the options. The Metadata calls already check both catalog and schema arguments on input and use whichever one was entered, so it's not really a big deal except that the current inconsistency bothers me a little (but I can get over it lol!)

-Mark

Re: MySQl Metadata consistency

Posted: 09.12.2020, 17:02
by EgonHugeist
@Mark,

do it -> #2. Why i was voting for #1 because of keeping backward compatibility for users who are using the TZSQLMetadata components. Now we're on Beta 8, so i've no objection if we go this route!
@Jan please document it..

Re: MySQl Metadata consistency

Posted: 10.12.2020, 11:08
by marsupilami
EgonHugeist wrote: 09.12.2020, 17:02 @Jan please document it..
I will do that :)