TFA error after GI upgrade to 19c

Recently I made an Exadata stack upgrade/update to the last 19.2 version (19.2.7.0.0.191012) and I upgraded the GI from 18c to 19c (last 19c version – 19.5.0.0.191015) and after that, TFA does not work.
Since I don’t want to complete execute a TFA clean and reinstallation I tried to find the error and the solution. Here I want to share with you the workaround (since there is no solution yet) that I discovered and used to fix the error.

 

The environment

 

The actual environment is:
  • Old Grid Infrastructure: Version 18.6.0.0.190416
  • New Grid Infrastructure: Version 19.5.0.0.191015
  • Exadata domU: Version 19.2.7.0.0.191012 running kernel 4.1.12-124.30.1.el7uek.x86_64
 

TFA error

 

After upgrade the GI from 18c to 19c, the TFA does not work. If you try to start it or collect log using it, you can receive errors. In the environment described here, the TFA was running fine with the 18c version, and the rootupgrade script from 18c to 19c does not report an error.
And to be more precise, the TFA upgrade from 18c to 19c called by rootupgrade was ok (according to the log – I will show later). But even after that, the error occurs.
The provided solution as usual (by MOS support): download the lastest TFA and reinstall the actual one. Unfortunately, I not like this approach because can lead to an error during GI upgrade for next releases (like 20) and updates (19.6 as an example).
So, when I tried to collect TFA:

 

[root@exsite1c1 ~]# /u01/app/19.0.0.0/grid/tfa/bin/tfactl diagcollect

WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.

TFA-00002 Oracle Trace File Analyzer (TFA) is not running

Please start TFA before running collections

[root@exsite1c1 ~]#

 

So, when checking for running TFA I made ps -ef and not saw process running:

 

[root@exsite1c1 ~]# ps -ef |grep tfa

root      10665      1  0 Nov21 ?        00:00:06 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null

root      40285  37137  0 11:05 pts/0    00:00:00 grep --color=auto tfa

[root@exsite1c1 ~]#

 

And if I try to start TFA (as root), nothing report (error or OK):

 

[root@exsite1c1 ~]# /etc/init.d/init.tfa start

Starting TFA..

Waiting up to 100 seconds for TFA to be started..

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

[root@exsite1c1 ~]#

[root@exsite1c1 ~]# ps -ef |grep tfa

root      10665      1  0 Nov21 ?        00:00:06 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null

root      46031  37137  0 11:07 pts/0    00:00:00 grep --color=auto tfa

[root@exsite1c1 ~]#

 

Checking in the MOS I saw related problems with bad Perl version. For this TFA release is needed version 5.10 at lease. But was not the case:

 

[root@exsite1c1 ~]# perl -v




This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

(with 39 registered patches, see perl -V for more detail)




Copyright 1987-2012, Larry Wall




Perl may be copied only under the terms of either the Artistic License or the

GNU General Public License, which may be found in the Perl 5 source kit.




Complete documentation for Perl, including FAQ lists, should be found on

this system using "man perl" or "perldoc perl".  If you have access to the

Internet, point your browser at http://www.perl.org/, the Perl Home Page.




[root@exsite1c1 ~]#

 

Searching the problem

 

Digging for the source of the problem I checked the rootupgrade but the report was good. The TFA upgrade completed with success:

 

[root@exsite1c1 ~]# vi /u01/app/grid/crsdata/exsite1c2/crsconfig/rootcrs_exsite1c2_2019-11-15_12-12-21AM.log

...

...

2019-11-14 14:18:40: Executing the [UpgradeTFA] step with checkpoint [null] ...

2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 595 '1' '18' 'UpgradeTFA'

2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 595 '1' '18' 'UpgradeTFA'

2019-11-14 14:18:40: Command output:

>  CLSRSC-595: Executing upgrade step 1 of 18: 'UpgradeTFA'.

>End Command output

2019-11-14 14:18:40: CLSRSC-595: Executing upgrade step 1 of 18: 'UpgradeTFA'.

2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 4015

2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 4015

2019-11-14 14:18:40: Command output:

>  CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

>End Command output

2019-11-14 14:18:40: CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.

2019-11-14 14:18:40: Executing the [ValidateEnv] step with checkpoint [null] ...

2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/crs/install/tfa_setup -silent -crshome /u01/app/19.0.0.0/grid

2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 595 '2' '18' 'ValidateEnv'

2019-11-14 14:18:40: Executing cmd: /u01/app/19.0.0.0/grid/bin/clsecho -p has -f clsrsc -m 595 '2' '18' 'ValidateEnv'

2019-11-14 14:18:40: Command output:

>  CLSRSC-595: Executing upgrade step 2 of 18: 'ValidateEnv'.

...

...

2019-11-14 14:23:45: Command output:



>  TFA Installation Log will be written to File : /tmp/tfa_install_293046_2019_11_14-14_18_40.log

...

...

2019-11-14 14:23:45: Command output:

>  CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.

>End Command output

 

And other related logs reported complete success:

 

[root@exsite1c1 ~]# cat /tmp/tfa_install_293046_2019_11_14-14_18_40.log

[2019-11-14 14:18:40] Log File written to : /tmp/tfa_install_293046_2019_11_14-14_18_40.log

[2019-11-14 14:18:40]

[2019-11-14 14:18:40] Starting TFA installation

[2019-11-14 14:18:40]

[2019-11-14 14:18:40] TFA Version: 192000 Build Date: 201904260414

[2019-11-14 14:18:40]

[2019-11-14 14:18:40] About to check previous TFA installations ...

[2019-11-14 14:18:40] TFA HOME : /u01/app/18.0.0/grid/tfa/exsite1c1/tfa_home

[2019-11-14 14:18:40]

[2019-11-14 14:18:40] Installed Build Version: 184100 Build Date: 201902260236

[2019-11-14 14:18:40]

[2019-11-14 14:18:40] INSTALL_TYPE GI

[2019-11-14 14:18:40] Shutting down TFA for Migration...

[2019-11-14 14:20:24]

[2019-11-14 14:20:24] Removing /etc/init.d/init.tfa...

[2019-11-14 14:20:24]

[2019-11-14 14:20:24] Migrating TFA to /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home...

[2019-11-14 14:20:50]

[2019-11-14 14:20:50] Starting TFA on exsite1c1...

[2019-11-14 14:20:50]

[2019-11-14 14:21:05]

[2019-11-14 14:21:05] TFA_INSTALLER /u01/app/19.0.0.0/grid/crs/install/tfa_setup

[2019-11-14 14:21:05] TFA is already installed. Upgrading TFA

[2019-11-14 14:21:05]

[2019-11-14 14:21:05] TFA patching CRS or DB from zipfile extracted to /tmp/.293046.tfa

[2019-11-14 14:21:06] TFA Upgrade Log : /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfapatch.log

[2019-11-14 14:23:31] Patch Status : 0

[2019-11-14 14:23:31] Patching OK : Running install_ext

[2019-11-14 14:23:32] Installing oratop extension..

[2019-11-14 14:23:32]

.-----------------------------------------------------------------.

| Host      | TFA Version | TFA Build ID         | Upgrade Status |

+-----------+-------------+----------------------+----------------+

| exsite1c1 |  19.2.0.0.0 | 19200020190426041420 | UPGRADED       |

| exsite1c2 |  18.4.1.0.0 | 18410020190226023629 | NOT UPGRADED   |

'-----------+-------------+----------------------+----------------'




[2019-11-14 14:23:44] Removing Old TFA /u01/app/18.0.0/grid/tfa/exsite1c1/tfa_home...

[2019-11-14 14:23:45] Cleanup serializable files

[2019-11-14 14:23:45]

[root@exsite1c1 ~]#

[root@exsite1c1 ~]# cat /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfapatch.log




TFA will be upgraded on Node exsite1c1:







Upgrading TFA on exsite1c1 :




Stopping TFA Support Tools...




Shutting down TFA for Patching...




Shutting down TFA

. . . . .

. . .

Successfully shutdown TFA..




No Berkeley DB upgrade required




Copying TFA Certificates...







Starting TFA in exsite1c1...




Starting TFA..

Waiting up to 100 seconds for TFA to be started..

. . . . .

Successfully started TFA Process..

. . . . .

TFA Started and listening for commands




Enabling Access for Non-root Users on exsite1c1...




[root@exsite1c1 ~]#

 

One know problem occurs when (for some reason) the nodes of the clusters lost the sync for TFA. I tried to do the sync, and this pointed one clue:

 

[root@exsite1c1 ~]# /u01/app/19.0.0.0/grid/tfa/bin/tfactl syncnodes

WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.




/u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/bin/synctfanodes.sh: line 237: /u01/app/18.0.0/grid/perl/bin/perl: No such file or directory

TFA-00002 Oracle Trace File Analyzer (TFA) is not running




Current Node List in TFA :

1.




Unable to determine Node List to be synced. Please update manually.




Do you want to update this node list? [Y|N] [N]: ^C[root@exsite1c1 ~]#

[root@exsite1c1 ~]#

 

As you can see, the syncnodes.sh made a reference for the old 18c GI home. And inside of the sync script, you can see the reference of that like 237 (my mark below) checked for PERL, and this came from the file tfa_setup.txt.

 

[root@exsite1c1 ~]# vi /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/bin/synctfanodes.sh

...

...

        if [ `$GREP -c '^PERL=' $tfa_home/tfa_setup.txt` -ge 1 ]    <== TFA CHECK

        then

                PERL=`$GREP '^PERL=' $tfa_home/tfa_setup.txt | $AWK -F"=" '{print $2}'`;

        fi




        if [ `$GREP -c '^CRS_HOME=' $tfa_home/tfa_setup.txt` -ge 1 ]

        then

                CRS_HOME=`$GREP '^CRS_HOME=' $tfa_home/tfa_setup.txt | $AWK -F"=" '{print $2}'`;

        fi




        if [ `$GREP -c '^RUN_MODE=' $tfa_home/tfa_setup.txt` -ge 1 ]

        then

                RUN_MODE=`$GREP '^RUN_MODE=' $tfa_home/tfa_setup.txt | $AWK -F"=" '{print $2}'`;

        fi

fi




RUSER=`$RUID | $AWK '{print $1}' | $AWK -F\( '{print $2}' | $AWK -F\) '{print $1}'`;




if [ $RUSER != $DAEMON_OWNER ]

then

        $ECHO "User '$RUSER' does not have permissions to run this script.";

        exit 1;

fi




SSH_USER="$DAEMON_OWNER";




HOSTNAME=`hostname | $CUT -d. -f1 | $PERL -ne 'print lc'`;    <===== LINE 237

...

...

 

Checking tfa_setup.txt

Checking the file we can see the error:

 

[root@exsite1c1 ~]# cat /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/tfa_setup.txt

CRS_HOME=/u01/app/18.0.0/grid

exsite1c1%CRS_INSTALLED=1

NODE_NAMES=exsite1c1

ORACLE_BASE=/u01/app/grid

JAVA_HOME=/u01/app/18.0.0/grid/jdk/jre

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/OPatch/crs/log

exsite1c1%CFGTOOLS%DIAGDEST=/u01/app/12.1.0.2/grid/cfgtoollogs

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/crf/db/exsite1c1

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/crs/log

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/cv/log

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/evm/admin/log

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/evm/admin/logger

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/evm/log

exsite1c1%INSTALL%DIAGDEST=/u01/app/12.1.0.2/grid/install

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/log

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/network/log

exsite1c1%DBWLM%DIAGDEST=/u01/app/12.1.0.2/grid/oc4j/j2ee/home/log

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/opmn/logs

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/racg/log

exsite1c1%ASM%DIAGDEST=/u01/app/12.1.0.2/grid/rdbms/log

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/scheduler/log

exsite1c1%CRS%DIAGDEST=/u01/app/12.1.0.2/grid/srvm/log

exsite1c1%ACFS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/acfs

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/core

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/crsconfig

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/crsdiag

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/cvu

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/evm

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/output

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/trace

exsite1c1%INSTALL%DIAGDEST=/u01/app/oraInventory/ContentsXML

exsite1c1%INSTALL%DIAGDEST=/u01/app/oraInventory/logs

TRACE_LEVEL=1

INSTALL_TYPE=GI

PERL=/u01/app/18.0.0/grid/perl/bin/perl

RDBMS_ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1||

RDBMS_ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1||

RDBMS_ORACLE_HOME=/u01/app/12.2.0.1/grid||

TZ=Europe/Luxembourg

RDBMS_ORACLE_HOME=/u01/app/18.0.0/grid||

localnode%ADRBASE=/u01/app/grid

RDBMS_ORACLE_HOME=/u01/app/oracle/product/18.0.0/dbhome_1||

localnode%ADRBASE=/u01/app/oracle

RDBMS_ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/financ||

localnode%ADRBASE=/u01/app/oracle

RDBMS_ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/financ||

localnode%ADRBASE=/u01/app/oracle

DAEMON_OWNER=root

RDBMS_ORACLE_HOME=/u01/app/oracle/agent/13.2.0/agent_13.2.0.0.0||

RDBMS_ORACLE_HOME=/u01/app/12.1.0.2/grid||

RDBMS_ORACLE_HOME=/u01/app/19.0.0.0/grid||

localnode%ADRBASE=/u01/app/grid

CRS_ACTIVE_VERSION=

[root@exsite1c1 ~]#

 

As you can see above, the CRS_HOME, JAVA_HOME, PERL, and ORACLE_HOME parameters are pointing to the old GI folder. As a workaround I edited the tfa_setup.txt in both nodes and fixed the GI folder from 18.0 to 19.0:

 

[root@exsite1c1 ~]# vi /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/tfa_setup.txt

[root@exsite1c1 ~]#

[root@exsite1c1 ~]#

[root@exsite1c1 ~]#

[root@exsite1c1 ~]# cat /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home/tfa_setup.txt

CRS_HOME=/u01/app/19.0.0.0/grid

exsite1c1%CRS_INSTALLED=1

NODE_NAMES=exsite1c1

ORACLE_BASE=/u01/app/grid

JAVA_HOME=/u01/app/19.0.0.0/grid/jdk/jre

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/OPatch/crs/log

exsite1c1%CFGTOOLS%DIAGDEST=/u01/app/19.0.0.0/grid/cfgtoollogs

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/crf/db/exsite1c1

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/crs/log

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/cv/log

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/evm/admin/log

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/evm/admin/logger

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/evm/log

exsite1c1%INSTALL%DIAGDEST=/u01/app/19.0.0.0/grid/install

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/log

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/network/log

exsite1c1%DBWLM%DIAGDEST=/u01/app/19.0.0.0/grid/oc4j/j2ee/home/log

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/opmn/logs

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/racg/log

exsite1c1%ASM%DIAGDEST=/u01/app/19.0.0.0/grid/rdbms/log

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/scheduler/log

exsite1c1%CRS%DIAGDEST=/u01/app/19.0.0.0/grid/srvm/log

exsite1c1%ACFS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/acfs

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/core

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/crsconfig

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/crsdiag

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/cvu

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/evm

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/output

exsite1c1%CRS%DIAGDEST=/u01/app/grid/crsdata/exsite1c1/trace

exsite1c1%INSTALL%DIAGDEST=/u01/app/oraInventory/ContentsXML

exsite1c1%INSTALL%DIAGDEST=/u01/app/oraInventory/logs

TRACE_LEVEL=1

INSTALL_TYPE=GI

PERL=/u01/app/19.0.0.0/grid/perl/bin/perl

RDBMS_ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1||

RDBMS_ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1||

TZ=Europe/Luxembourg

RDBMS_ORACLE_HOME=/u01/app/oracle/product/18.0.0/dbhome_1||

localnode%ADRBASE=/u01/app/oracle

RDBMS_ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/financ||

localnode%ADRBASE=/u01/app/oracle

RDBMS_ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/financ||

localnode%ADRBASE=/u01/app/oracle

DAEMON_OWNER=root

RDBMS_ORACLE_HOME=/u01/app/oracle/agent/13.2.0/agent_13.2.0.0.0||

RDBMS_ORACLE_HOME=/u01/app/19.0.0.0/grid||

localnode%ADRBASE=/u01/app/grid

CRS_ACTIVE_VERSION=19.0.0.0

[root@exsite1c1 ~]#

 

And after edit was possible to start TAF correctly:

 

[root@exsite1c1 ~]# /etc/init.d/init.tfa start

Starting TFA..

Waiting up to 100 seconds for TFA to be started..

. . . . .

Successfully started TFA Process..

. . . . .

WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.

TFA Started and listening for commands

[root@exsite1c1 ~]#

[root@exsite1c1 ~]#

[root@exsite1c1 ~]# ps -ef |grep tfa

root     113905      1  0 11:31 ?        00:00:00 /bin/sh /etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null

root     115917      1 99 11:31 ?        00:00:24 /u01/app/19.0.0.0/grid/jdk/jre/bin/java -server -Xms256m -Xmx512m -Djava.awt.headless=true -Ddisable.checkForUpdate=true -XX:ParallelGCThreads=5 oracle.rat.tfa.TFAMain /u01/app/19.0.0.0/grid/tfa/exsite1c1/tfa_home

root     117853  37137  0 11:31 pts/0    00:00:00 grep --color=auto tfa

[root@exsite1c1 ~]#

 

And execute the diagcollect:

 

[root@exsite1c1 ~]# /u01/app/19.0.0.0/grid/tfa/bin/tfactl diagcollect

WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.

 

By default TFA will collect diagnostics for the last 12 hours. This can result in large collections

For more targeted collections enter the time of the incident, otherwise hit <RETURN> to collect for the last 12 hours

[YYYY-MM-DD HH24:MI:SS,<RETURN>=Collect for last 12 hours] :

 

Collecting data for the last 12 hours for all components...

Collecting data for all nodes

 

Collection Id : 20191122124148exsite1c1

 

Detailed Logging at : /u01/app/grid/tfa/repository/collection_Fri_Nov_22_12_41_49_CET_2019_node_all/diagcollect_20191122124148_exsite1c1.log

2019/11/22 12:41:53 CET : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom

2019/11/22 12:41:53 CET : Collection Name : tfa_Fri_Nov_22_12_41_49_CET_2019.zip

2019/11/22 12:41:54 CET : Collecting diagnostics from hosts : [exsite1c1, exsite1c2]

2019/11/22 12:41:54 CET : Scanning of files for Collection in progress...

2019/11/22 12:41:54 CET : Collecting additional diagnostic information...

2019/11/22 12:44:13 CET : Completed collection of additional diagnostic information...

2019/11/22 13:15:39 CET : Getting list of files satisfying time range [11/22/2019 00:41:53 CET, 11/22/2019 13:15:39 CET]

2019/11/22 13:40:42 CET : Collecting ADR incident files...

2019/11/22 13:40:48 CET : Completed Local Collection

2019/11/22 13:40:48 CET : Remote Collection in Progress...

.---------------------------------------.

|           Collection Summary          |

+-----------+-----------+-------+-------+

| Host      | Status    | Size  | Time  |

+-----------+-----------+-------+-------+

| exsite1c2 | Completed | 412MB |  318s |

| exsite1c1 | Completed | 284MB | 3534s |

'-----------+-----------+-------+-------'

 

Logs are being collected to: /u01/app/grid/tfa/repository/collection_Fri_Nov_22_12_41_49_CET_2019_node_all

/u01/app/grid/tfa/repository/collection_Fri_Nov_22_12_41_49_CET_2019_node_all/exsite1c1.tfa_Fri_Nov_22_12_41_49_CET_2019.zip

/u01/app/grid/tfa/repository/collection_Fri_Nov_22_12_41_49_CET_2019_node_all/exsite1c2.tfa_Fri_Nov_22_12_41_49_CET_2019.zip

[root@exsite1c1 ~]#

[root@exsite1c1 ~]# /u01/app/19.0.0.0/grid/tfa/bin/tfactl diagcollect -since 1h

WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.

Collecting data for all nodes

 

Collection Id : 20191122134319exsite1c1

 

Detailed Logging at : /u01/app/grid/tfa/repository/collection_Fri_Nov_22_13_43_20_CET_2019_node_all/diagcollect_20191122134319_exsite1c1.log

2019/11/22 13:43:24 CET : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom

2019/11/22 13:43:24 CET : Collection Name : tfa_Fri_Nov_22_13_43_20_CET_2019.zip

2019/11/22 13:43:24 CET : Collecting diagnostics from hosts : [exsite1c1, exsite1c2]

2019/11/22 13:43:24 CET : Scanning of files for Collection in progress...

2019/11/22 13:43:24 CET : Collecting additional diagnostic information...

2019/11/22 13:44:49 CET : Getting list of files satisfying time range [11/22/2019 12:43:24 CET, 11/22/2019 13:44:49 CET]

2019/11/22 13:45:50 CET : Completed collection of additional diagnostic information...

2019/11/22 13:59:19 CET : Collecting ADR incident files...

2019/11/22 13:59:19 CET : Completed Local Collection

2019/11/22 13:59:19 CET : Remote Collection in Progress...

.--------------------------------------.

|          Collection Summary          |

+-----------+-----------+-------+------+

| Host      | Status    | Size  | Time |

+-----------+-----------+-------+------+

| exsite1c2 | Completed | 230MB | 295s |

| exsite1c1 | Completed | 105MB | 955s |

'-----------+-----------+-------+------'

 

Logs are being collected to: /u01/app/grid/tfa/repository/collection_Fri_Nov_22_13_43_20_CET_2019_node_all

/u01/app/grid/tfa/repository/collection_Fri_Nov_22_13_43_20_CET_2019_node_all/exsite1c2.tfa_Fri_Nov_22_13_43_20_CET_2019.zip

/u01/app/grid/tfa/repository/collection_Fri_Nov_22_13_43_20_CET_2019_node_all/exsite1c1.tfa_Fri_Nov_22_13_43_20_CET_2019.zip

[root@exsite1c1 ~]#

 

TFA error #2

Another error that I got in another cluster that passed for the same update/upgrade process was related with *ser files in tfa home. If I try to use TFA (with diagcolect as an example) I receive this error:

 

[root@exsite2c1 ~]# /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/bin/tfactl diagcollect

WARNING - TFA Software is older than 180 days. Please consider upgrading TFA to the latest version.

Storable binary image v2.10 contains data of type 101. This Storable is v2.9 and can only handle data types up to 30 at /usr/lib64/perl5/vendor_perl/Storable.pm line 381, at /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/bin/common/tfactlshare.pm line 25611.

[root@exsite2c1 ~]#   

 

If you look in the MOS, will point to PERL version. But it is not the case here, the perl it is more than 5.10 version for this version of Exadata. The solution was more *.ser files to another folder (remove from TFA home), or delete it. After that, no more “Storage binary error” (but the error about with tfa_setup.txt continues):

 

[root@exsite2c1 ~]# mv /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/internal/*ser /tmp

[root@exsite2c1 ~]# ls -l /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/internal/*ser

ls: cannot access /u01/app/19.0.0.0/grid/tfa/exsite2c1/tfa_home/internal/*ser: No such file or directory

[root@exsite2c1 ~]#

 

Problem and Solution

It is not clear the source of the problem in this case. As you saw above, the logs of upgrade/update of GI from 18c to 19c reported success, even for TFA. But it is clear that tfa_setup.txt was left with wrong parameters inside. And if you look closely you can see that exists reference to the new GI home in the first version.

But unfortunately, the needed parameters were left with the wrong values. The workaround was just to change the tfa_setup.txt and fix the wrong folders for parameters. Was not tested to execute the $GI_HOME/grid/crs/install/tfa_setup -silent -crshome $GI_HOME to fix the filed, but you can try. The idea was trying to identify the issue instead of just remove TFA and reinstall it.

Again, this is a workaround tested in my environment and worked. You need to verify logs and other files to see if you hit the same issues. If yes, at least, you can try.

 
 

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community.”