Page 1 of 2

Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 05.10.2020, 13:26
by aehimself
Hello,

The case is easy to reproduce. Have a TZConnection connected to an Oracle host, while having .AutoCommit on false. Now, unplug your network cable (or disconnect from VPN, just make sure there is no connection between the client and the server anymore) and call ZConnection.Disconnect.

It will fail as there is no connection anymore, which is fine. The problem is, it is leaking a TZOracleTransaction object, created in ZDbcOracle.pas, TZOracleConnection.Open, line 595:

Code: Select all

  fLocalTransaction := TZOracleTransaction.CreateLocal(Self);
During the Disconnect call, this is attempted to be freed in ZDbcOracle.pas, TZOracleConnection.InternalClose, line 831:

Code: Select all

    fLocalTransaction := nil;
As this is an interface, assigning nil on it attempts to destroy the object. TZOracleTransaction.BeforeDestruction calls .Rollback:

Code: Select all

    Status := FOwner.FPlainDriver.OCITransRollback(FOwner.FContextHandle,
      FOwner.FErrorHandle, OCI_DEFAULT);
    if Status <> OCI_SUCCESS then
      FOwner.HandleErrorOrWarning(FOwner.FErrorHandle, Status, lcTransaction, 'TRANSACTION ROLLBACK', Self);
And, HandleErrorOrWarning is throwing an exception (during the destruction of TZOracleTransaction), effectively leaving the transaction object alive.

I might be wrong, so I'll not create a patch, but my idea would be to check for destruction before throwing an exception or include a new parameter for
HandleErrorOrWarning to force it NOT to raise an exception.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 07.10.2020, 09:11
by marsupilami
Hello aehimself :)

I created a ticket for this: #452. Not sure, where Egonhugeist will pick up on this.
Best regards,

Jan

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 07.10.2020, 19:01
by aehimself
Jan,

No worries; I'm close to 1000 kms away from home at the moment, I'm planning the travel back tomorrow. No stops, no Internet, no PC. After that I'll need a good long sleep, so I'm not going to do any development-related things for 2 days for sure :)
In theory I was on vacation for a week, but a bug report found me... this is when I started to investigate...

In the mean time I saw that @Egonhugeist checked in a fix which MIGHT be related, so I'll try that. For sure, it will be already visible on Git when I will have time :)

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 08.10.2020, 17:45
by EgonHugeist
Hi ae,

joarrrr such things are hard to test. Yes you assumption it's related is correct. For the connection lost issue i introduced IImmediatelyReleasable interface. Please test the fix. Could you consider to build a test which could interrupt a connection and we finally can test the behavior?

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 08.10.2020, 18:56
by aehimself
@Egonhugeist,

You mean this patch? https://sourceforge.net/p/zeoslib/code-0/6933

The result is the same. Exception is raised during the destruction of TZOracleTransaction, and therefore the object is not freed.

It's a golden rule that exceptions MUST not be raised during destructors in Delphi, or it will lead to memory leaks. Wither rollback should not be called, Rollback should not raise an exception when the transaction is being destroyed OR the exception should be swallowed in the destructor.

Test case? Forget about it. You need to control the network / Oracle server to be able to imitate a connection loss afaik...
Or...... what happens if you kill the connection from a separate connection...? I'll test that out later on.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 08.10.2020, 19:16
by EgonHugeist
aehimself wrote: 08.10.2020, 18:56 Or...... what happens if you kill the connection from a separate connection...? I'll test that out later on.
You got me. I never ever had time to make tests for connection loss. Honestly i did some tests while running the test-suite, set a break-point, kill the server process, and test what happens. I know we can achive such test's using a second connection/thread and kill the current one. So i would be happy if you could simply write tests. Cherry on cake would be if the would be written for our suites. OTOH i need some time(which is rare) to port them. If provided once fixing the culprit is easy for me(to deep in zeos code, Jan always says). Anyway everything is better than doing nothing... Waiting.... :P

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 08.10.2020, 19:51
by aehimself

Code: Select all

program Project2;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils, ZConnection, ZDataSet;

Var
 conn, killer: TZConnection;
 q: TZQuery;

Procedure SetConn(Const inConnection: TZConnection);
Begin
 inConnection.Protocol := 'oracle';
 inConnection.User := 'username';
 inCOnnection.Password := 'password';
 inConnection.Database := 'tnsnamesora';
End;

begin
  ReportMemoryLeaksOnShutdown := True;

  try
  try
   conn := TZConnection.Create(nil);
   q := TZQuery.Create(nil);
   Try
    SetConn(conn);
    conn.AutoCommit := False;
    q.Connection := conn;
    conn.Connect;
    q.SQL.Text := 'SELECT SID, SERIAL# FROM V$SESSION WHERE AUDSID = Sys_Context(''USERENV'', ''SESSIONID'')';
    q.Open;
    killer := TZConnection.Create(nil);
    Try
     SetConn(killer);
     killer.Connect;
     killer.ExecuteDirect('ALTER SYSTEM DISCONNECT SESSION ''' + q.Fields[0].AsString + ',' + q.Fields[1].AsString + ''' IMMEDIATE');
    Finally
     FreeAndNil(killer);
    End;

    conn.Disconnect;
   Finally
    FreeAndNil(conn);
    FreeAndNil(q);
   End;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  finally
    readln;
  end;
end.
The funny thing is, that in this case ReportMemoryLeaksOnShutdown does not show the leak (probably, because I'm calling this from my frame destructor...) but DeLeaker does. I'll try to make a valid test case where ReportMemoryLeaksOnShutdown also shows the leak.

Edit:
The destructor exception applies to me as well. I changed my prod code from

Code: Select all

Destructor TSQLConnectionFrame.Destroy;
Begin
 ForceSQLDisconnect;
 If Assigned(_schemas) Then FreeAndNil(_schemas);
 If Assigned(_queries) Then FreeAndNil(_queries);
 If Assigned(_li) Then FreeAndNil(_li);
 inherited;
End;
to

Code: Select all

Destructor TSQLConnectionFrame.Destroy;
Begin
 Try
  ForceSQLDisconnect;
 Finally
  If Assigned(_schemas) Then FreeAndNil(_schemas);
  If Assigned(_queries) Then FreeAndNil(_queries);
  If Assigned(_li) Then FreeAndNil(_li);
  inherited;
 End;
End;
And it is ALMOST perfect. The bad thing is that ForceSQLDisconnect will raise an exception (which is handled) but it calls a code like this in the exception handler...

Code: Select all

 If inConnection.Connected Then Begin
                                If inConnection.ServerVersion > 0 Then Result := Result + ' '  + inConnection.ServerVersionStr;
                                If inConnection.ClientVersion > 0 Then Result := Result + ', client version: ' + inConnection.ClientVersionStr;
                                End;
Even if the connection is lost, TZConnection shows as connected, and ServerVersion will throw an exception too, as it can not be queried anymore. Is there a viable way to check if the connection is still alive? Other that ping, which also can raise an exception.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 09.10.2020, 11:15
by aehimself
I went ahead with an ugly solution, but it works :D

Code: Select all

 Try
  If inConnection.Ping And (inConnection.ServerVersion > 0) Then Result := Result + ' '  + inConnection.ServerVersionStr;
 Except
  // Swallow all exceptions. If Ping fails, there's no connection -> avoid memory leak in oci.dll caused by .ServerVersion
 End;
 If inConnection.ClientVersion > 0 Then Result := Result + ', client version: ' + inConnection.ClientVersionStr;
This combined with the Try ... Finally destructor, there are no memory leaks in my program.

However - I think - the original TZOracleTransaction leak should be fixed, too.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 16.10.2020, 17:41
by EgonHugeist
Hi AE yes himself,

nice example. Reason for the failing logic was a unknown OCI error code.
However: We german say: "Eine Hand spuckt in die Andere", means if you help us we help you.
Patch done https://sourceforge.net/p/zeoslib/code-0/6966/. Iv'e been doing deeper teast than just having a ConnLoss on a connection object. I just testet: Statements, ResultSets and Lobs. That than said from my POV we should be perfectly prepared for Connenction Loss on Oracle now.
Summary:
1. report your findings!
2. Could you please consider write comparable scenarios for other DBRM's you use? That would be great and save my rare time.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 17.10.2020, 10:17
by aehimself
Hello,

I'm a bit confused here. DeLeaker still shows the leak for me, FastMM (ReportMemoryLeaksOnShutdown) does not. As I'm getting the "Sorry, the board attachment quota has been reached." message I can not upload the screenshot, but it says 2 objects are leaked using the exact same console test program:

TZStringList
Project3.exe!System.TObject.NewInstance Line 17836 00409240
Project3.exe!System.ClassCreate Line 19227 00409973
Project3.exe!System.Classes.TStringList.Create Line 7822 004be5a7
Project3.exe!Zdbcoracle.TZOracleTransaction.CreateLocal Line 1561 0076c138
Project3.exe!Zdbcoracle.TZOracleTransaction.CreateLocal Line 1562 0076c145
Project3.exe!Zdbcoracle.TZOracleConnection.Open Line 692 00769c99
Project3.exe!Zabstractconnection.TZAbstractConnection.Connect Line 979 00807951
Project3.exe!Project3.initialization Line 33 00818604

TZOracleTransaction
Project3.exe!System.TObject.NewInstance Line 17836 00409240
Project3.exe!System.TInterfacedObject.NewInstance Line 39917 00410e20
Project3.exe!Zdbcoracle.TZOracleConnection.Open Line 692 00769c99
Project3.exe!Zabstractconnection.TZAbstractConnection.Connect Line 979 00807951
Project3.exe!Project3.initialization Line 33 00818604

I'm unfamiliar with interfaces, but I suspect the same rules apply to them as normal Delphi classes: you have an exception in a destructor, said object will leak; and this still happens with TZOracleTransaction - so it leaking sounds just logical to me.

Now, this MIGHT be a bug in DeLeaker. We have MadExcept licensed at work, I'll try to run the test project with it on Monday to see if it shows any leaks.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 17.10.2020, 11:02
by EgonHugeist
Hmm, what's the ErrorCode returned by zeos?
In my case i get an 3113 error. Let me explain:
To get the ImmediatelyReleasable logic running the "connection lost" error-code need to be known.
That than means if TZOracleConnection.HandleErrorOrWarning get's as first error 3113 or 3114(found in code, might be obsolate) then all objects supporting the IImmediatelyReleasable are clearing it's resources in a huge recursive method quue and waiting for destruction. But if there is another ErrorCode ... Just the Exception is raised.. On my side it's working perfectly.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 17.10.2020, 15:32
by aehimself
There are two exceptions raised while calling conn.Disconnect.

First exception TZOracleConnection.InternalClose:953 -> TZOracleTransaction.BeforeDestruction:1507 -> TZOracleTransaction.Rollback:1631 -> TZOracleConnection.HandleErrorOrWarning:1416

Project Project3.exe raised exception class EZSQLConnectionLost with message 'SQL Error: ORA-03113: end-of-file on communication channel
Process ID: 4256
Session ID: 406 Serial number: 45913


Code: 3113 SQL: TRANSACTION ROLLBACK'.

Second exception TZOracleConnection.InternalClose:961 -> TZOracleConnection.HandleErrorOrWarning:1416

Project Project3.exe raised exception class EZSQLException with message 'SQL Error: OCI_INVALID_HANDLE
Code: -2 Message: DISCONNECT FROM "DEVELOPER"
'.

But still, I think you are checking the wrong end of the code. If I modify TZOracleConnection.BeforeDestruction:

Code: Select all

procedure TZOracleTransaction.BeforeDestruction;
var Status: sword;
begin
  inherited BeforeDestruction;
  if FOCITrans <> nil then begin
    try
      fSavepoints.Clear;
      try
        if FStarted then
          RollBack; // <- Exception can be raised here, normally NOTHING below this line gets executed
        Status := FOwner.FPlainDriver.OCIHandleFree(FOCITrans, OCI_HTYPE_TRANS);
        if Status <> OCI_SUCCESS then
          FOwner.HandleErrorOrWarning(FOwner.FErrorHandle, Status, lcTransaction, 'OCIHandleFree', Self); // <- Exception can be raised here, so after swallowing the first exception NOTHING below this line gets executed
      except
       // Swallow exception, so destructor can finish in peace
      end;
    finally
      FOCITrans := nil;
    end;
  end;
  fSavepoints.Free; // <- With the original design, the code will NEVER reach this line, thus leaking this object too with TZOracleTransaction
end;
This way no exception is raised when Delphi is trying to dispose the transaction object, there are no leaks at all; not even the fSavePoints TStringList. The leak is not due to server resources, it's due to the destructor design of the TZTransaction object.

Edit: It seems that the transaction is trying to rollback and free the handle of a non existing connection, therefore raising exceptions. These must only be called if we are sure that the connection is still alive.

To try to demonstrate what is happening, consider the below code:

Code: Select all

 TMyClass = Class
 strict private
  _sl: TStringList;
 public
  Constructor Create; ReIntroduce;
  Destructor Destroy; Override;
 End;

constructor TMyClass.Create;
begin
  inherited;
  _sl := TStringList.Create;
end;

destructor TMyClass.Destroy;
begin
 Raise Exception.Create('Leak');
 _sl.Free;
 inherited;
end;

TMyClass.Create.Free;
According to FastMM, this will leak:

---------------------------
Unexpected Memory Leak
---------------------------
An unexpected memory leak has occurred. The unexpected small block leaks are:

1 - 12 bytes: TMyClass x 1
13 - 20 bytes: Unknown x 1
69 - 76 bytes: TStringList x 1

This is the exact thing happening to TZOracleTransaction. But instead of MyClass, it's the transaction object, the stringlist is fSavePoints.


P.s.: It seems ReportMemoryLeaksOnShutdown does not work on console applications on 10.4.1, this is why I had to rely on DeLeaker.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 17.10.2020, 16:11
by aehimself
Patch created on GitHub. This fixes the leak for good.

I introduced a new boolean within TZOracleTransaction and set it to true in the BeforeDestruction. If this variable is True, .RollBack does not raise an exception. I also got rid of the verification of OCIHandleFree(FOCITrans, OCI_HTYPE_TRANS) as
1, it can raise an exception too
2, it really does not matter if it fails or not.

This way the test application disconnects gracefully, no exceptions are raised.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 18.10.2020, 08:51
by EgonHugeist
Thanks for the afford and the memory-leak lessons :P.

Well i was to many steps forward. My test case was a bit more verbose and did include Open[Connection,Transaction,ResultSet,Lob's]. But yes, you are right, it did not hit your example, which is is the "abnormal case", i think, but needs to be resolved too. I've added your testcase to our testsuites.
According your pullrequest:
There is no need for and FDestroying param dealing with reference counted objects. If FRefcount is zero we're in destruction. So i just added that, to suppress the exception while destruction.
Your DeLeaker shouln't return the leak anymore. According FastMM, did you try to use the FastMM4 unit in first place of the project? Then you can setup the unit in the FastMM4Options.inc and have much more details like stacktraces to find root of leaking memory.

Re: Zeos 7.3-200a1058, Oracle ungraceful disconnect leaks a TZOracleTransaction object

Posted: 18.10.2020, 11:30
by aehimself
EgonHugeist wrote: 18.10.2020, 08:51Well i was to many steps forward. My test case was a bit more verbose and did include Open[Connection,Transaction,ResultSet,Lob's].
You usually are :) I can not see more than the actual error I'm facing and this is expected - this is why you are the dev of Zeos and I'm just a loud user! ;) :P
EgonHugeist wrote: 18.10.2020, 08:51According your pullrequest:
There is no need for and FDestroying param dealing with reference counted objects. If FRefcount is zero we're in destruction. So i just added that, to suppress the exception while destruction.
I always seen those variables when debugging components but - as I already mentioned - I'm not familiar with Interfaces. I like your idea better though - it does what I wanted to do but without the need of an external indicator.
EgonHugeist wrote: 18.10.2020, 08:51Your DeLeaker shouln't return the leak anymore. According FastMM, did you try to use the FastMM4 unit in first place of the project? Then you can setup the unit in the FastMM4Options.inc and have much more details like stacktraces to find root of leaking memory.
I'm using the built-in version of FastMM, which can not pinpoint source locations. Since I have a valid license for DeLeaker there was absolutelly no need to update... let's face it, FastMM is mainly a memory manager, the other is "only" a leak detection tool. It has one role only so it is doing that one thing better.
EgonHugeist wrote: 18.10.2020, 08:51Thanks for the afford and the memory-leak lessons :P.
I'm sorry if I started to sound condescending (?) I just wanted to point out the issue which I failed to do so up until yesterday. English is not our native language but we all speak Delphi and there is no misunderstanding in code :)
Next time, instead of just reporting I will try to fix the issue with the spaghetti logic of mine and create a patch (which I already should have started doing a long time ago, duh). You always can check what it does and why and correct it to be more sophisticated.

Long story short: I cancelled my pull request and discarded those changes so I could resync from the official repository. Disconnect now behaves as it did with my version, so I guess we can call it a success :) Team Zeos saves the day again! :cheers: