Metadata cache key retrieval API

technobot · Post by **technobot** » 15.06.2008, 11:55

I've extended the set of functions used to retrieve metadata cache keys, and adjusted the various db-specific Metadata classes to use the new fucntions. This has several advantages:

1. The cache key retrieval API is now complete, rather than being one lonely function that only handles one type of key.
2. There is better encapsulation, because all the functions are part of TZAbstractDatabaseMetadata, instead of being global.
3. There is less duplicated code, since all descendants of TZAbstractDatabaseMetadata use these functions now to get the keys they need.
4. As I was doing this, I stumbled across a few minor bugs, which I've corrected.

I marked the old GetTablesMetaDataCacheKey function as deprecated. This avoids breaking any existing code that might be using it, but lets people know they should switch to the new API.

Post by **mdaems** » 16.06.2008, 12:32

technobot,

I had a look at your patch.
Generally : the idea looks good.
But : Isn't it too limited?
I mean : why only generalize the key-generation? All these TZXXXDatabaseMetadata.GetXxx( ) functions (should) have exactly the same structure. Wouldn't it be possible to just make an abstract GetXxx() function that does all cache handling and a db dependent RetrieveXxxMetaData function.

Code: Select all

function TZAbstractDatabaseMetadata.GetProcedures(const Catalog, SchemaPattern,
  ProcedureNamePattern: string): IZResultSet;
var
  Key: string;
begin
  Key := Format('get-procedures:%s:%s:%s',
    [Catalog, SchemaPattern, ProcedureNamePattern]);

  Result := GetResultSetFromCache(Key);
  if Result = nil then
  begin
    Result := RetrieveProceduresMetaData(Catalog, SchemaPattern,ProcedureNamePattern);
    AddResultSetToCache(Key, Result);
  end;
end;

function TZAbstractDatabaseMetadata.RetrieveProceduresMetaData(const Catalog, SchemaPattern,
  ProcedureNamePattern: string): IZResultSet;
begin
    Result := ConstructVirtualResultSet(ProceduresColumnsDynArray);
end;

function TZAdoDatabaseMetadata.RetrieveProceduresMetaData(const Catalog, SchemaPattern,
  ProcedureNamePattern: string): IZResultSet;
var
  AdoRecordSet: ZPlainAdo.RecordSet;
begin
    Result := ConstructVirtualResultSet(ProceduresColumnsDynArray);

    AdoRecordSet := AdoOpenSchema(adSchemaProcedures,
      [Catalog, SchemaPattern, ProcedureNamePattern, '']);
    if Assigned(AdoRecordSet) then
    begin
      with TZAdoResultSet.Create(GetStatement, '', AdoRecordSet) do
      begin
        while Next do
        begin
          Result.MoveToInsertRow;
          Result.UpdateStringByName('PROCEDURE_CAT',
            GetStringByName('PROCEDURE_CATALOG'));
          Result.UpdateStringByName('PROCEDURE_SCHEM',
            GetStringByName('PROCEDURE_SCHEMA'));
          Result.UpdateStringByName('PROCEDURE_NAME',
            GetStringByName('PROCEDURE_NAME'));
          Result.UpdateStringByName('REMARKS',
            GetStringByName('DESCRIPTION'));
          Result.UpdateShortByName('PROCEDURE_TYPE',
            GetShortByName('PROCEDURE_TYPE') - 1);
          Result.InsertRow;
        end;
        Close;
        Free;
      end;
    end;
end;

What do you think?

Mark

technobot · Post by **technobot** » 16.06.2008, 15:37

I think first you need separate functions for the key generation, so that you could use them with ClearCache(Key), instead of guessing what the key should be. Beyond that - it might be possible to generalize the matadata functions further, but I think that's a separate issue. Though it is worth a look.

EDIT: Or, if one could put all the key handling in the abstract ancestor, then maybe one could define an enum to use with ClearCache instead of the string keys, and have the keys be difined internally in a an array that is indexed by this enum. Something like:

Code: Select all

// public code%u3a
type
  TDBMetadataEntry = %u28dbmeTables, dbmeColumns, ....%u29;
  TDBMetadataEntries = set of TDBMetadataEntry;

// private code%u3a
const
  DBMetadataKey = array %u5bTDBMetadataEntry%u5d of ShortString = %u28
    'get-tables',     // dbmeTables
    'get-columns',  // dbmeColumns
    ....
  %u29;

// TZAbstractDatabaseMetadata code%u3a

procedure TZAbstractDatabaseMetadata.ClearCache%u28const Keys%u3a TDBMetadataEntries%u29
begin
  ....
end;

function TZAbstractDatabaseMetadata.GetProcedures%u28const Catalog, SchemaPattern,
  ProcedureNamePattern%u3a string%u29%u3a IZResultSet;
var
  Key%u3a string;
begin
  Key %u3a= DBMetadataKey%u5bdbmeProcedures%u5d;

  ...
end;

Problem is that then you loose all the parameters for the keys (catalog name, schema name, etc).

EDIT2: And if you loose the parameters for the keys, then you have two problems:
1. You have metadata entries that hold large volumes of data. E.g. the procedures entry would hold the data for all the procedures. This can reduce efficiency, since you have a large overhead when you want to find data about just one or two procedures.
2. For example, GetProcedures('MyCatalog', ...) and GetProcedures('YourCatalog', ...) would have to share a key, which is problematic a best.

EDIT3: Generally speaking, the approach you're suggesting does look good, and makes sense. But I still think there should be some way to get the keys from outside the Getxxx methods, and that's what this patch is for.

Post by **mdaems** » 16.06.2008, 20:35

Why don't we do both... Accept this patch and then try to do the second part?
My sample was easiest to copy/paste in a small space. But of course, your extra helper functions make my code even cleaner.

I'll see when I have some time to test the patch.

Let's forget about your 'Enum proposal'. The disadvantages are clearly greater than te advantages.

Mark

technobot · Post by **technobot** » 16.06.2008, 22:28

> Why don't we do both... Accept this patch and then try to do the second part?
> Let's forget about your 'Enum proposal'.

Ok.

I took another look at the code. I seems that with the exception of a couple special cases, ALL the Getxxxx functions have this general structure:

Code: Select all

function <class>.Getxxxx(<params>): IZResultSet;
var
  Key: string;
begin
  Key := GetxxxxCacheKey(<params>);

  Result := GetResultSetFromCache(Key);
  if Result = nil then
  begin
    Result := ConstructVirtualResultSet(xxxxColumnsDynArray);
    <fill Result, optionally using the params>
    AddResultSetToCache(Key, Result);
  end;
end;

If the <fill result> part could be moved to after the AddResultSetToCache call, then all of that code can be moved to one single procedure, like so:

Code: Select all

function TZAbstractDatabaseMetadata.GetCachedMetadata(const Key: string; const ColumnsDefs: TZMetadataColumnDefs; out ResultSet: IZResultSet): Boolean;
begin
  ResultSet := GetResultSetFromCache(Key);
  Result := Assigned(ResultSet);
  if (not Result) then
  begin
    ResultSet := ConstructVirtualResultSet(ColumnsDefs);
    AddResultSetToCache(Key, ResultSet);
  end;
end;


function TZAbstractDatabaseMetadata.GetProcedures(const Catalog: string;
  const SchemaPattern: string; const ProcedureNamePattern: string): IZResultSet;
var
  Key: string;
begin
  Key := GetProceduresCacheKey(Catalog, SchemaPattern, ProcedureNamePattern);

  if (not GetCahcedMetadata(Key, ProceduresColumnsDynArray, Result)) then
    GetProceduresMetadata(Catalog, SchemaPattern, ProcedureNamePattern, Result);
end;


procedure <descendant-class>.GetProceduresMetadata(const Catalog: string;
  const SchemaPattern: string; const ProcedureNamePattern: string; Result: IZResultSet);
begin
  // fill the result set
end;

But the problem is that AddResultSetToCache clones the result set for some reason, and caches only the copy of the result set. So if the result set is filled after it's added to the cache, then it won't affect the cached data. Any particular reason why it's done this way?

EDIT: Just noticed that GetResultSetFromCache also clones the result set. Looks like a lot of copying being done all the time... Is it really necessary?

Post by **mdaems** » 16.06.2008, 23:11

No idea about being really necessary... Maybe it makes cleaning up old memory easier? (Because once copied the original isn't longer necessary when it's owner doesn't need it anymore.
Anyway, let's not jump too far at once. Let's keep it with the AddResultSetToCache in the TZAbstractDatabaseMeta.GetXXXX() function. This cleans the code, I believe. Then we can move on if that proves reasonable.

I'm trying to test your first patch now.

Mark

Post by **mdaems** » 17.06.2008, 00:00

Added your get_metadata_cache_key_api.patch to SVN testing branch. Revision 380. (Ran test suite for mysql and FB)

Waiting for next stage...

technobot · Post by **technobot** » 17.06.2008, 20:41

I've given it some more thought. I don't really like the idea of adding a large group of methods, which don't add any new functionality, to an already very large class... While this does remove some duplicate code, it makes TZAbstractDatabaseMetadata more cumbersome. I think doing it this way would just be switching one problem for another.

I think it would be better to use the inheritance of the Getxxx functions themselves, rather than delegate to a new group of functions and then inherit them. But for that the base Getxxx functions (in TZAbstractDatabaseMetadata) need to be restructured somehow, so that descendants can take advantage of that code, rather than reimplement it from scratch every time. And this brings me back to the issue of copying the result set inside AddResultSetToCache. If this could be eliminated, then the descendant Getxxx could simply be:

Code: Select all

function <descendant-class>.Getxxxx(<params>): IZResultSet;
begin
  Result := inherited;
  <fill Result>
end;

with the TZAbstractDatabaseMetadata.Getxxx handling the usual GetResultSetFromCache etc.

But I'm a bit nervous about just removing the copying, because we don't want the users modifying the cache entries, or freeing them without removing them, or stuff like that... On the other hand, the copying also adds some overhead to any operation that involves the cache. Any suggestions/thoughts?

Post by **mdaems** » 17.06.2008, 23:02

I still prefer delegation. But the delegation functions should not be part of the interface. And be protected. So they should not be usable for end users.

The reason : using your method you would also need to check if the base class method found a cached result and if not you would have to add it to the cache (if you get the a handle to the cache from the base class or not is just a detail) in every descendant. The 'Inheritance' you call it would at least be a cripple one...

BTW : you wouldn't need to implement the GetXXX() functions in de descendants. So you don't make the implementation of the different drivers more difficult. It's just the function name that changes and all cache handling isn't necessary any more, making implementing the driver specific stuff even easier.

Mark

technobot · Post by **technobot** » 17.06.2008, 23:45

mdaems wrote:I still prefer delegation. But the delegation functions should not be part of the interface. And be protected. So they should not be usable for end users.

Of course.

The reason : using your method you would also need to check if the base class method found a cached result and if not you would have to add it to the cache (if you get the a handle to the cache from the base class or not is just a detail) in every descendant. The 'Inheritance' you call it would at least be a cripple one...

I was hoping to leave the adding part in the base class (along with all the rest of the caching behavior). But you're right about the checking, since you only want to fill the result-set if it wasn't taken from the cache..

BTW : you wouldn't need to implement the GetXXX() functions in de descendants. So you don't make the implementation of the different drivers more difficult. It's just the function name that changes and all cache handling isn't necessary any more, making implementing the driver specific stuff even easier.

I understand all that. But I think you're going about it the wrong way. Whether you want to use inheritance or delegation, you should be taking out the common part, not the varying part. Instead of having 20 delegate functions that do the data filling, have one that does the caching. The problem is how to make it work - right now the copying of the cached data is getting in the way...

Post by **mdaems** » 18.06.2008, 00:48

Not an OO guy, here...

I read your opinion in an other way, maybe.
Taking out the common part : in the attached patch you'll see the common part is taken away from the DB specific files... (Only did it for GetTables)

I agree this only saves on the caching code, at the cost of one extra (partially hidden) function per GetXxx() Interface function at the abstract classlevel.

When thinking the way you want, I think about a cache handling function you should pass a callback function "for the case it doesn't find the cached record". However, I'm not sure how one handles that kind if callbacks when the parameter list of these functions vary. I'm afraid Object Pascal can't handle that.

Please, think a little more about it, while I continue writing the patch my way. In case you find a solution that's more optimal, no doubt we'll follow that one. Otherwise we'll have to use the sub-optimal solution until some smarter guy shows up.

technobot · Post by **technobot** » 18.06.2008, 07:14

I think the key to the solution lies in the data duplication inside AddResultSetToCache.

IF we can remove the data duplication, we gain two things:
1. We reduce the overhead on cache operations.
2. We can separate the caching behavior from the rest of the code inside the Getxxx functions, and then we can easily move the cache handling to a separate function, like so:

Code: Select all


// current structure:
function <class>.Getxxxx(<params>): IZResultSet;
var
  Key: string;
begin
  Key := GetxxxxCacheKey(<params>);

  Result := GetResultSetFromCache(Key);
  if Result = nil then
  begin
    Result := ConstructVirtualResultSet(xxxxColumnsDynArray);
    <fill Result, optionally using the params>
    AddResultSetToCache(Key, Result);
  end;
end;


// rearanged structure:
function <class>.Getxxxx(<params>): IZResultSet;
var
  Key: string;
begin
  Key := GetxxxxCacheKey(<params>);

  Result := GetResultSetFromCache(Key);
  if Result = nil then
  begin
    Result := ConstructVirtualResultSet(xxxxColumnsDynArray);
    AddResultSetToCache(Key, Result);
    <fill Result, optionally using the params> // notice that this moved to the end
  end;
end;


// final structure:
function <class>.Getxxxx(<params>): IZResultSet;
var
  Key: string;
begin
  Key := GetxxxxCacheKey(<params>);

  <call delegate to handle the cache. delegate returns a result-set and a boolean flag stating if the result set is new or not>

  <if the result-set is new, fill it, optionally using the params>
end;

The question is, can the data duplication be removed safely? I've been thinking about it (as you may have guessed from all my repeated nagging about the data duplication

), and I think it can. I think it is rather unlikely that someone uses the metadata in a read-write manner. There's just not much point to modifying the metadata. It's also not likely that anyone frees the metadata object they get, since they only get an interface. And as long as no-one (or almost no-one) modifies the metadata or attempts to free it, then the data duplication is not necessary as far as I can tell. But just to be on the safer side, if the person who wrote the AddResultSetToCache method is still around, I'd like to ask them why they copy the data...

In the mean time, I think you should wait with that patch a bit - don't be in such a rush. Because if you write it now, once you finish you will find that you have 20 or so Getxxxx functions that all have almost the same caching code. Ideally, you'd want to move that code to a separate function. And then once you do that, then you might as well leave the data filling inside the Getxxx functions, since they won't be doing much else anyway.

technobot · Post by **technobot** » 20.06.2008, 19:41

Just wanted to add a concluding remark for this thread:
I've been thinking about all of this after our talk. Unfortunately, it seems my idea just isn't practical. And while yours isn't perfect, it certainly has its advantages. If you need help implementing it, let me know. Though I expect I may be busy the next 2-4 days or so.

Btw, if you applied my little demo patch, you should probably roll it back. Or at least put back the call to "Clone..." in AddResultSetToCache.

As for the other cleanup we mentioned, I'll have a look when I get the time...

Post by **mdaems** » 23.06.2008, 23:23

Committed my solution. SVN Testing branch rev. 382

Mark

technobot · Post by **technobot** » 26.06.2008, 18:46

As I'm working on the other cleanup, I've noticed a couple problems with your changes. The first one is that you forgot about the GetUDTs method. The second problem is that there are a few places where you call a cached method from an uncached method, for example TZMsSqlDatabaseMetadata.UncachedGetImportedKeys and TZInterbase6DatabaseMetadata.UncachedGetCrossReference. You should only be calling the uncached versions from the uncached methods.

EDIT: Another small issue: there are a few methods that were left "virtual" instead of "override". This hides the parent method, which is wrong. If you don't see it, I recommend to turn on all the compiler warnings (maybe except the platform-specific ones), and pay attention to what the compiler says.