Engineer System – Page 2 – LUXOUG – LUXEMBOURG ORACLE USERS GROUP

Engineer System

ZDLRA, Configuring Replication Network

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

ZDLRA, Configuring Replication Network

Is common that our systems grow with time, and the environment that sustains it needs to improve. And the same occurs for ZDLRA. Imagine that now you added a new datacenter and bought a new ZDLRA and want to replicate between them, or that now you want to enable the replication, configuring it.

This is possible and is not complicated to do, and I will show here how to do that. So, in this post, I will show how to configure the replication network for ZDLRA that was already deployed. Basically a post-install procedure.

Replication network

In my previous post, I already explain the basics of how the replication works for ZDLRA. And going deep about the replication, the ZDLRA have a physical dedicated network to replicate:

And according to documentation:

“The optional replication network connects the local Recovery Appliance (the upstream appliance) with a remote Recovery Appliance (the downstream appliance). Oracle recommends a broadband, encrypted network, instead of an insecure public network, wherever possible.”

It is optional because you can share the same “Ingest Network” to receive replication, but is not recommended. But whatever the network mode that you choose the “Replication Network” is a different subnet that not overlaps/is part of the “Ingest Network”.

ZDLRA Environment

In this post I have two ZDLRA’s to enable the replication. But this can vary in your example. So, I will show how to do in both nodes, in both ZDLRA’s (will be 4 to do).

So, use this as a guide on how to do, but remember to adapt to your environment. Ip’s, physical networks (bond or no), and routes are an example of details that will change. I will focus on how to configure the GI and ZDLRA in this post.

And the most important, I recommend opening a proactive SR at MOS informing what are you doing and asking doubts before the start. A good start point is a note “Post Install – Replication Network Configuration for ZDLRA (Doc ID 2126047.1)” at MOS.

Basic network config

The network configuration follows the same as normal Linux network configuration: you need to define the separated ip’s for both ZDLRA’s, if you will use bond, or the route table. Since this depends on every environment I will not cover here.

But basically:

1. Hostname: You need to choose at least: 1 Hostname at your replication network for each node, 1 Hostname for VIP for each node

2. Scan Hostname: 1 Hostname to be used as a scan for replication data exchange. The scan will be used for each ZDLRA that you are configuring.

3. Configure network files: Configure the ifcfg files. The files ifcfg-eth2, ifcfg-eth3 and ifcfg-bondeth1 if you will use bond. If just ifcfg-eth2 if you will use a single connection. This is done in all nodes of the ZDLRA appliance.

4. Route configuration: You need to guarantee that access made by the replication network is not routed through the normal ingest network. This depends on the way that you have your network architecture, but maybe you need to change the files “route-*” in each node.

My current system is:

ZDLRAS1: ZDLRA installed on site 1, have two nodes: zdlras1n1 and zdlras1n2. It will be upstream (who send backups).

ZDLRAS2: ZDLRA installed on site 2, have two nodes: zdlras2n1 and zdlras2n2. It will be downstream (who received backup).

What I will add:

ZDLRAS1: I will add: zdlras1n1-rvip (200.254.255.21) as replication vip for zdlras1n1, zdlras1n2-rvip (200.254.255.22) as replication vip for zdlras1n2. Scan zdlras1-rep.

ZDLRAS2: I will add: zdlras1n1-rvip (200.254.255.23) as replication vip for zdlras2n1, zdlras2n2-rvip (200.254.255.24) as replication vip for zdlras2n2. Scan zdlras2-rep

GI Configuration

The next step after the basic configuration is configure the GI to add the network, vip, and scan.

Checking interfaces

After you configure the Linux part and the network basic configuration the GI can be check if the new interface can be used by GI. So, the first step is to check this in both nodes in both ZDLRA’s with command “oifcfg iflist” (I cropped below to show just what is needed):

############## Upstream ZDLRA

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/oifcfg iflist -p -n

…

bondeth1  200.254.255.0  UNKNOWN  255.255.255.0

…

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]#  /u01/app/19.0.0.0/grid/bin/oifcfg iflist -p -n

…

bondeth1  200.254.255.0  UNKNOWN  255.255.255.0

…

[root@zdlras2n1 ~]#

Add network

Since the interfaces are visible and not used, we can add the network number 2 at GI level. This is done in just one node per ZDLRA. TO do that we use the command “srvctl add network” as root user in both ZDLRA’s that we are configuring:

############## Upstream ZDLRA

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl config network -k 2

PRCR-1001 : Resource ora.net2.network does not exist

[root@zdlras1n1 ~]#

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add network -netnum 2 -subnet 200.254.255.0/255.255.255.0/bondeth1

[root@zdlras1n1 ~]#

[root@zdlras1n1 ~]# srvctl config network -k 2

Network 2 exists

Subnet IPv4: 200.254.255.0/255.255.255.0/bondeth1, static

Subnet IPv6:

Ping Targets:

Network is enabled

Network is individually enabled on nodes:

Network is individually disabled on nodes:

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl config network -k 2

PRCR-1001 : Resource ora.net2.network does not exist

[root@zdlras2n1 ~]#

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add network -netnum 2 -subnet 200.254.255.0/255.255.255.0/bondeth1

[root@zdlras2n1 ~]#

[root@zdlras2n1 ~]# srvctl config network -k 2

Network 2 exists

Subnet IPv4: 200.254.255.0/255.255.255.0/bondeth1, static

Subnet IPv6:

Ping Targets:

Network is enabled

Network is individually enabled on nodes:

Network is individually disabled on nodes:

[root@zdlras2n1 ~]#

Look above that we are using the bondeth1 as the interface for this network (the bondeth0 if for ingest network). And I tested if the network is available or no (need to be unused).

Add VIP

After we can add the VIP for each node for each ZDLRA that we are configuring (done using “srvctl add vip”):

############## Upstream ZDLRA

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add vip -n zdlras1n1 -A zdlras1n1-rvip.oralocal/255.255.255.0/bondeth1 -k 2

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add vip -n zdlras1n2 -A zdlras1n2-rvip.oralocal/255.255.255.0/bondeth1 -k 2

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add vip -n zdlras2n1 -A zdlras2n1-rvip.oralocal/255.255.255.0/bondeth1 -k 2

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add vip -n zdlras2n2 -A zdlras2n2-rvip.oralocal/255.255.255.0/bondeth1 -k 2

[root@zdlras2n1 ~]#

Check that I defined the hostname for each vip for each node and the parameter “k” that defined the network where this vip will be created. Be careful with the “n” parameter that defined the node name.

Add Scan

After add vip we can add scan for each ZDLRA cluster with “srvctl add scan”:

############## Upstream ZDLRA

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add scan -netnum 2 -scanname zdlras1-rep.oralocal

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add scan -netnum 2 -scanname zdlras2-rep.oralocal

[root@zdlras2n1 ~]#

Be careful to use the connect “netnum” parameter value (needs to point to the second network) and scan name.

Start Vip

After the scan we can start the vips for each node in each ZDLRA:

############## Upstream ZDLRA

[root@zdlras1n1 ~]# srvctl status vip -i zdlras1n1-rvip.oralocal

VIP 200.254.255.21 is enabled

VIP 200.254.255.21 is running on node: zdlras1n1

[root@zdlras1n1 ~]#

[root@zdlras1n1 ~]# srvctl status vip -i zdlras1n2-rvip.oralocal

VIP 200.254.255.22 is enabled

VIP 200.254.255.22 is running on node: zdlras1n2

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]# srvctl status vip -i zdlras2n1-rvip.oralocal

VIP 200.254.255.23 is enabled

VIP 200.254.255.23 is running on node: zdlras2n1

[root@zdlras2n1 ~]#

[root@zdlras2n1 ~]# srvctl status vip -i zdlras2n2-rvip.oralocal

VIP 200.254.255.24 is enabled

VIP 200.254.255.24 is running on node: zdlras2n2

[root@zdlras2n1 ~]#

Create Listener

The last step for GI configuration is to create and start “listener” and “scan listener” in each ZDLRA cluster:

############## Upstream ZDLRA

[oracle@zdlras1n1 ~]$ export ORACLE_HOME=/u01/app/19.0.0.0/grid

[oracle@zdlras1n1 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zdlras1n1 ~]$  

[oracle@zdlras1n1 ~]$ srvctl add listener -l LISTENER_REPL -p 1522 -k 2

[oracle@zdlras1n1 ~]$ srvctl start listener -l LISTENER_REPL

[oracle@zdlras1n1 ~]$ srvctl status listener -l LISTENER_REPL

Listener LISTENER_REPL is enabled

Listener LISTENER_REPL is running on node(s): zdlras1n1,zdlras1n2

[oracle@zdlras1n1 ~]$

[oracle@zdlras1n1 ~]$ srvctl add scan_listener -netnum 2 -listener LISTENER_REPL -endpoints TCP:1522

[oracle@zdlras1n1 ~]$ srvctl start scan_listener -netnum 2

[oracle@zdlras1n1 ~]$ srvctl status scan_listener -netnum 2

SCAN Listener LISTENER_REPL_SCAN1_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN1_NET2 is running on node zdlras1n1

SCAN Listener LISTENER_REPL_SCAN2_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN2_NET2 is running on node zdlras1n2

SCAN Listener LISTENER_REPL_SCAN3_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN3_NET2 is running on node zdlras1n1

[oracle@zdlras1n1 ~]$




############## Downstream ZDLRA

[root@zdlras2n1 ~]# su - oracle

Last login: Sat Nov 23 21:28:03 CET 2019

[oracle@zdlras2n1 ~]$ export ORACLE_HOME=/u01/app/19.0.0.0/grid

[oracle@zdlras2n1 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zdlras2n1 ~]$

[oracle@zdlras2n1 ~]$ srvctl add listener -l LISTENER_REPL -p 1522 -k 2

[oracle@zdlras2n1 ~]$ srvctl start listener -l LISTENER_REPL

[oracle@zdlras2n1 ~]$ srvctl status listener -l LISTENER_REPL

Listener LISTENER_REPL is enabled

Listener LISTENER_REPL is running on node(s): zdlras2n1,zdlras2n2

[oracle@zdlras2n1 ~]$

[oracle@zdlras2n1 ~]$

[oracle@zdlras2n1 ~]$ srvctl add scan_listener -netnum 2 -listener LISTENER_REPL -endpoints TCP:1522

[oracle@zdlras2n1 ~]$ srvctl start scan_listener -netnum 2

[oracle@zdlras2n1 ~]$ srvctl status scan_listener -netnum 2

SCAN Listener LISTENER_REPL_SCAN1_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN1_NET2 is running on node zdlras2n1

SCAN Listener LISTENER_REPL_SCAN2_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN2_NET2 is running on node zdlras2n2

SCAN Listener LISTENER_REPL_SCAN3_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN3_NET2 is running on node zdlras2n1

[oracle@zdlras2n1 ~]$

[oracle@zdlras2n1 ~]$

The important detail here:

Port: it is 1522. Look at the “endpoint” parameter.
Network: Check that listener is added ate network #2 (parameter “k”).
Listener name: we use LISTENER_REPL to follow the ZDLRA default config.
User: All the commands are run with oracle

All the commands are executed in just one node of the cluster because they affect all nodes at the same time.

With that now with have the network configured in each node of both ZDLRA’s, all the vips and scan up and running, and also the listening to this new network.

ZDLRA Configuration

Since we have all nodes with everything configured, we can add this network ad ZDLRA config. The idea for ZDLRA is to allow it to receive and send backups trough this new network, and this is incredibly easy.

We need to change just two parameters at ZDLRA config tables: REPLICATION_IP_ADDRESS and BACKUP_IP_ADDRESS. The configuration change internal ZDLRA tables, so, before doing that review and check with Oracle Support at MOS if everything is OK and you can proceed.

REPLICATION_IP_ADDRESS

This configuration resides at the intenal “rasys.host” table for ZDLRA. So, we update the table column:

############## Upstream ZDLRA

[oracle@zdlras1n1 ~]$ sqlplus rasys/change^Me2




SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 23 23:22:09 2019

Version 19.3.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.




Last Successful login time: Sat Nov 23 2019 21:33:26 +01:00




Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.3.0.0.0




SQL> set linesize 250

SQL>

SQL> col node_name format a50

SQL> col REPLICATION_IP_ADDRESS format a50

SQL> select node_name,replication_ip_address from host;




NODE_NAME                                          REPLICATION_IP_ADDRESS

-------------------------------------------------- --------------------------------------------------

zdlras1n1.oralocal

zdlras1n2.oralocal




SQL> update HOST set REPLICATION_IP_ADDRESS='200.254.255.21' where NODE_NAME = 'zdlras1n1.oralocal';




1 row updated.




SQL> update HOST set REPLICATION_IP_ADDRESS='200.254.255.22' where NODE_NAME = 'zdlras1n2.oralocal';




1 row updated.




SQL> select node_name,replication_ip_address from host;




NODE_NAME                                          REPLICATION_IP_ADDRESS

-------------------------------------------------- --------------------------------------------------

zdlras1n1.oralocal                                 200.254.255.21

zdlras1n2.oralocal                                 200.254.255.22




SQL>




############## Downstream ZDLRA

[oracle@zdlras2n1 ~]$ sqlplus rasys/change^Me2




SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 23 23:39:13 2019

Version 19.3.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.




Last Successful login time: Sat Nov 23 2019 19:54:17 +01:00




Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.3.0.0.0




SQL> set linesize 250

SQL> col node_name format a50

SQL> col REPLICATION_IP_ADDRESS format a50

SQL>

SQL> select node_name,replication_ip_address from host;




NODE_NAME                                          REPLICATION_IP_ADDRESS

-------------------------------------------------- --------------------------------------------------

zdlras2n2.oralocal

zdlras2n1.oralocal




SQL>

SQL> update HOST set REPLICATION_IP_ADDRESS='200.254.255.23' where NODE_NAME = 'zdlras2n1.oralocal';




1 row updated.




SQL> update HOST set REPLICATION_IP_ADDRESS='200.254.255.24' where NODE_NAME = 'zdlras2n2.oralocal';




1 row updated.




SQL>

SQL> select node_name,replication_ip_address from host;




NODE_NAME                                          REPLICATION_IP_ADDRESS

-------------------------------------------------- --------------------------------------------------

zdlras2n2.oralocal                                 200.254.255.24

zdlras2n1.oralocal                                 200.254.255.23




SQL>

As you can see above, the REPLICATION_IP_ADDRESS for each node reflects the IP at the replication network.

BACKUP_IP_ADDRESS

This configuration defines with IP’s the ZDLRA allows ingesting backups. Since the replication basically is a ingest coming from other network, we need to allow it. To do that we change the table internal table rai_host. And, of course, we can commit if everything is OK.

############## Upstream ZDLRA

SQL> col ADMIN_IP_ADDRESS format a30

SQL> col BACKUP_IP_ADDRESS format a30

SQL> SELECT * FROM rai_host;




NODE_NAME                                          ADMIN_IP_ADDRESS               BACKUP_IP_ADDRESS              REPLICATION_IP_ADDRESS

-------------------------------------------------- ------------------------------ ------------------------------ --------------------------------------------------

zdlras1n1.oralocal                                 10.160.5.1                     10.160.5.2                     200.254.255.21

zdlras1n2.oralocal                                 10.160.5.3                     10.160.5.4                     200.254.255.22




SQL> UPDATE rai_host SET backup_ip_address='200.254.255.21,'|| backup_ip_address WHERE node_name = 'zdlras1n1.oralocal';




1 row updated.




SQL> UPDATE rai_host SET backup_ip_address='200.254.255.22,'|| backup_ip_address WHERE node_name = 'zdlras1n2.oralocal';




1 row updated.




SQL> SELECT * FROM rai_host;




NODE_NAME                                          ADMIN_IP_ADDRESS               BACKUP_IP_ADDRESS              REPLICATION_IP_ADDRESS

-------------------------------------------------- ------------------------------ ------------------------------ --------------------------------------------------

zdlras1n1.oralocal                                 10.160.5.1                     200.254.255.21,10.160.5.2      200.254.255.21

zdlras1n2.oralocal                                 10.160.5.3                     200.254.255.22,10.160.5.4      200.254.255.22




SQL> commit;




Commit complete.




SQL>




############## Downstream ZDLRA

SQL> col ADMIN_IP_ADDRESS format a30

SQL> col BACKUP_IP_ADDRESS format a30

SQL>

SQL> SELECT * FROM rai_host;




NODE_NAME                                          ADMIN_IP_ADDRESS               BACKUP_IP_ADDRESS              REPLICATION_IP_ADDRESS

-------------------------------------------------- ------------------------------ ------------------------------ --------------------------------------------------

zdlras2n2.oralocal                                 10.160.6.3                     10.160.6.4                     200.254.255.24

zdlras2n1.oralocal                                 10.160.6.1                     10.160.6.2                     200.254.255.23




SQL>

SQL> UPDATE rai_host SET backup_ip_address='200.254.255.23,'|| backup_ip_address WHERE node_name = 'zdlras2n1.oralocal';




1 row updated.




SQL> UPDATE rai_host SET backup_ip_address='200.254.255.24,'|| backup_ip_address WHERE node_name = 'zdlras2n2.oralocal';




1 row updated.




SQL> SELECT * FROM rai_host;




NODE_NAME                                          ADMIN_IP_ADDRESS               BACKUP_IP_ADDRESS              REPLICATION_IP_ADDRESS

-------------------------------------------------- ------------------------------ ------------------------------ --------------------------------------------------

zdlras2n2.oralocal                                 10.160.6.3                     200.254.255.24,10.160.6.4      200.254.255.24

zdlras2n1.oralocal                                 10.160.6.1                     200.254.255.23,10.160.6.2      200.254.255.23




SQL> commit;




Commit complete.




SQL>

After that, I recommend to reboot, ate least ZDLRA database to reload the configs. Again, remember that the values here (ip, networks, interfaces, and routes) will differ for you’re your environment. Use as a guide.

Replication

The replication for ZDLRA have some steps to be done. If we do this during the deployment is easier (since is done automatically by installer). But if we need to do after (because adding or preparing the environment) is not complicate, but we need to be aware of some details.

Be careful with a basic configuration like interface (bond or no), and ip’s. They will change in your scenario, but the most important is to check the route. The packet route needs to travel just inside the replication network because the ZDLRA configuration (at internal tables) expects/listens in just one ip.

And about ZDLRA internal change, they occur at internal tables. So, always review correctly the values to avoid errors (before the commit). After that, we can use the DBMS_RA.CREATE_REPLICATION_SERVER and create the replication server config, I will show this in another post.

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community.”

ZDLRA, Replication

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

ZDLRA, Replication

The replication for ZDRLA works differently than a “normal” for Oracle Database that uses Data Guard (or even Golden Gate). The point is to replicate the ingested backup “as is” between ZDLRA’s and not datafile block replication. And, of course, it is completely different from tape clones.

ZDLRA replication is not just sent backup from one site to another, it is how to increase your protection and be part of the disaster recovery strategy. The replication does not occur just for “rman backups”, but also for archivelogs generated for Real-Time Redo. And adding, this is how you integrate ZDLRA at your MAA architecture that makes the difference and how you protect your environment and reach zero RPO. There are several points about replication, how it operates, modes, and integration for Oracle MAA universe. I will discuss some points here in this post.

The architecture

The architecture for ZDLRA replication it is simple. There are two important definitions:

Upstream: It is the ZDLRA that receives the backup and forward it to another ZDLRA
Downstream: Is the ZDLRA that receives the backup from another ZDLRA

Basically it is this:

And the configuration can be:

One-Way: The data flows in one way only, only one ZDLRA forwards the backups.

Bi-Directional: Both ZDLRA’s send backups to each other. Is this case, the protected databases for each ZDLRA (usually one at the separated datacenter) are replicated between them since both operated as upstream/downstream.

Hub-Spoke: One ZDLRA receives backups from several ZDLRA’s. And this “third” ZDLRA is responsible to archive to tape.

Is more or less like the picture below:

One important detail is that every ZDLRA can clone backups to tape. Is not just for Hub-Spoke design.

For network connections, it is possible to dedicate one network port just for replication to avoid concurrent usage. But is also possible to share the same physical interface to receive backup ingest too. Whatever the chosen mode, the network for replication is a different subnet.

Replication and Index

I already wrote about some details from the internals of ZDLRA, the Virtual Full Backup, what you need to understand it, and also about INDEX_BACKUP (here and here). And if you check the posts, you already understood that ZDLRA “sees” the rman backup in a different way.

But for replication, some details are important to hint here. The replication is done for (and just for) every backupset that is ingested, so, the virtual full backup is not replication. On the other hand, every downstream (ZDLRA that receives backup) constructs the virtual full backup.

This is important for several reasons, but doing in this way the replication occurs as soon as possible. To understand you need to join several features. One of common usage for ZDLRA is to reduce the backups loads doing just incremental (and usually at big environments with several TB’s), so, if ZDLRA waits to finish the virtual full backups generation to replicate this can take some time (depending the size of datafile – TB’s – can be hours). And if you wait for the generation of virtual full backup the data transfer/replicated can be huge, instead replicated the incremental (that can be just some GB’s), it will replicate TB’s of the full? And another point is the unprotect window that you can add over the environment, unprotect because you will have the backup in just on side for some time frame (and in case of a disaster your backups can be lost). And besides all of that, because of the replication, you will have both in both sides the incremental and the full validated against errors.

The usage

The replication for ZDLRA is more than just sending backups from one side to another. Again, ZDLRA is more than just reduce backup load, it is a pillar of your architecture. It is a key part of MAA architecture.

Who deploys ZDLRA usually have a big environment and need to protect several databases and several sites, and usually already follow MAA practices. But the point is that ZDLRA can be used to protected all databases, from the single database to the multi-site database. Wirth ZDLRA replication, the ZDLRA for each site can protect their site databases, but also replicate single databases to improve the disaster recovery strategy. And if you add the Tape Clones, the protection is complete.

Think in the example above, we have two different sites. Some databases at the left side (Site A) are already protected by Oracle DG (but think that we can have the same from the right (Site B) replicating to the left too – Site A), and other databases without DG are protected by ZDLRA replication. What we have in this scenario:

For DG protected databases: Backup is not replicated between ZDLRA’s because each side has its own backup. Replication is done by DG.
Other databases: Backups replicated between ZDLRA’s (remember that can be multi-directional replication).

As you can see ZDLRA adds a new layer of protection/security for your environment and can be used to protect every kind of database. These kinds of architectures are shown in some details at Maximum Availability Architecture (MAA) – On-Premises HA Reference Architectures 2019, and here in this previous version of the same doc. I already made a Webinar discussing this too. And for Multi-site protection with ZDLRA, I already posted about it too, and you can read here. But whatever the mode of replication, you can reach ZERO RPO to all databases (even in case of site disaster) because you have the backups/archivelog needed to restore replication in other site (by DG or ZDLRA replication).

So, the usage of ZDLRA needs to be integrated into your architecture. From Bronze to Platinum databases it can be used.

There are several of others details to cover about replication at ZDLRA. This post was just a little introduction about the ZDLRA native replication. The idea was pointing some points about how more complex that just send backup from one site to another it is.

Adding more than one ZDLRA at your environment improve the strategy for MAA in several points. But the important is think in the architecture of your environment, understand the features of ZDLRA and how you can use it to reduce the single point of failure and improve the disaster recovery strategy. And with replication for ZDLRA (and multiple ZDLRA’s) the zero RPO can be from the single database, to the multi-site data guard database.

In the next posts I will show how to configure replicated ZDLRA, how the replication, what change for policies, and others details.

References:

Fast-Start Failover, Observe-Only Mode and Health Conditions

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

Fast-Start Failover, Observe-Only Mode and Health Conditions

Oracle Data Guard Broker allows the database administrators to automate some tasks and an easy way to configure properly a lot of features and details for data guard environments. The Fast-Start FailOver (FSFO) allows the broker to automatically failover to standby database in case of failure of the primary. But until 19c the only option is always to trigger the failover. This changed at 19c with a nice new feature that allows us to put FSFO in Observe-Only Mode.

In this post, I will focus just on new features for FSFO like Observe-Only Mode and Health Conditions for it. Lag and other details will not be covered here.

Observe-Only Mode

The Observe-Only Mode is a simple change that allows putting the FSFO to just observing/monitoring the DG environment, but in case of failure, it does not change the roles between primary and standby. Simple like that. As the Broker documentation for Observe-Only Mode says:

The observe-only mode enables you to test the impact of using fast-start failover in your configuration, without making any actual changes to the configuration.

Mode details can be checked in this link at documentation too. But FSFO is that:

Enable Observe-Only

So, to enable it is very simple, just need to call “ENABLE FAST_START FAILOVER OBSERVE ONLY”:

DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;

Enabled in Observe-Only Mode.

DGMGRL>

And at drc* trace file at primary side we can see:

2020-06-11T23:45:19.329+02:00

ENABLE FAST_START FAILOVER OBSERVE ONLY

FSFO SetState(st=47 "ENABLE OBONLY", fl=0x0 "", ob=0x2b621d39, tgt=2, v=0)

Setup log_archive_dest_n of GROUP=0 PRIORITY=0 with 'golds19c' as FSFO target

Fast-Start Failover (FSFO) has been enabled under observe-only mode between:

  Primary = "gold19c"

  Standby = "golds19c"

2020-06-11T23:45:20.527+02:00

ENABLE FAST_START FAILOVER OBSERVE ONLY completed successfully

And the result it is FSFO at Observe-Only Mode

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

And after we force the shutdown of the database, we can see that the roles not changed.

[oracle@goldpn1 ~]$ srvctl stop database -d gold19c -o abort

[oracle@goldpn1 ~]$

At Observer log file we can see some information that the error with primary was detected but nothing is done since it is in observe mode:

…

Unable to connect to database using gold19c

[W000 2020-06-12T00:13:22.248+02:00] Primary database cannot be reached.

[W000 2020-06-12T00:13:22.248+02:00] Fast-Start Failover threshold has expired.

[W000 2020-06-12T00:13:22.248+02:00] Try to connect to the standby.

[W000 2020-06-12T00:13:22.248+02:00] Making a last connection attempt to primary database before proceeding with Fast-Start Failover.

[W000 2020-06-12T00:13:22.248+02:00] Check if the standby is ready for failover.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor




Unable to connect to database using gold19c

[W000 2020-06-12T00:13:22.261+02:00] A fast-start failover would have been initiated...

[W000 2020-06-12T00:13:22.261+02:00] Unable to failover since this observer is in observe-only mode

[W000 2020-06-12T00:13:22.261+02:00] Fast-Start Failover is not possible because observe-only mode.

[W000 2020-06-12T00:13:22.261+02:00] Try to connect to the primary.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor




Unable to connect to database using gold19c

[W000 2020-06-12T00:13:22.269+02:00] Primary database cannot be reached.

[W000 2020-06-12T00:13:22.269+02:00] Fast-Start Failover observe-only mode enabled.

[W000 2020-06-12T00:13:22.269+02:00] Will not attempt a Fast-Start Failover.

[W000 2020-06-12T00:13:22.269+02:00] Retry connecting to primary.

[W000 2020-06-12T00:13:23.270+02:00] Try to connect to the primary.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor




Unable to connect to database using gold19c

[W000 2020-06-12T00:13:23.277+02:00] Primary database cannot be reached.

[W000 2020-06-12T00:13:24.278+02:00] Try to connect to the primary.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

…

And at drc* trace file at standby side we can see:

2020-06-12T00:13:21.103+02:00

Fast-Start Failover cannot proceed because: "observe-only mode"

Until now, this means that error with primary was detected, logged at logs, but no action was taken. The roles continue the same. The show report confirm this too:

DGMGRL> show configuration verbose;




Configuration - gold19c




  Protection Mode: MaxAvailability

  Members:

  gold19c  - Primary database

    golds19c - (*) Physical standby database




  (*) Fast-Start Failover target




  Properties:

    FastStartFailoverThreshold      = '30'

    OperationTimeout                = '30'

    TraceLevel                      = 'USER'

    FastStartFailoverLagLimit       = '0'

    CommunicationTimeout            = '180'

    ObserverReconnect               = '0'

    FastStartFailoverAutoReinstate  = 'TRUE'

    FastStartFailoverPmyShutdown    = 'TRUE'

    BystandersFollowRoleChange      = 'ALL'

    ObserverOverride                = 'FALSE'

    ExternalDestination1            = ''

    ExternalDestination2            = ''

    PrimaryLostWriteAction          = 'CONTINUE'

    ConfigurationWideServiceName    = 'gold19c_CFG'




Fast-Start Failover: Enabled in Observe-Only Mode

  Lag Limit:          0 seconds

  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configuration Status:

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

ORA-16625: cannot reach member "gold19c"

DGM-17017: unable to determine configuration status




DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

In some scenarios, this can be good because it allows us to fix the problem and not pass from a failover event (manual reinstate and so on in some cases). Another option is to use the Observe-Only mode to do what the name says, just observer. Think in one environment that you want to test some conditions and the health of the environment (network and others) before you really enable the FSFO.

So, if the primary database returns, the FSFO returns normally:

[oracle@goldpn1 ~]$ srvctl start database -d gold19c

[oracle@goldpn1 ~]$

At drc* file for standby:

2020-06-12T00:16:52.837+02:00

Primary connected to this instance.

2020-06-12T00:17:00.186+02:00

FSFO SetState(st=2 "UNSYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=11)

2020-06-12T00:17:06.951+02:00

FSFO SetState(st=1 "SYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=12)

At Broker:

DGMGRL> show configuration;




Configuration - gold19c




  Protection Mode: MaxAvailability

  Members:

  gold19c  - Primary database

    golds19c - (*) Physical standby database




Fast-Start Failover: Enabled in Observe-Only Mode




Configuration Status:

SUCCESS   (status updated 51 seconds ago)




DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

Upgrade and Downgrade modes

If the FSFO is operating in Observer-Only ode it is impossible to “upgrade” it to normal mode:

DGMGRL>  ENABLE FAST_START FAILOVER

Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.




Failed.

DGMGRL>

To do that, we need to disable the FSFO and enable it in normal mode:

DGMGRL>  ENABLE FAST_START FAILOVER

Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.




Failed.

DGMGRL>

DGMGRL> DISABLE FAST_START FAILOVER ;

Disabled.

DGMGRL> ENABLE FAST_START FAILOVER ;

Enabled in Zero Data Loss Mode.

DGMGRL>

To downgrade is the same, we can’t downgrade directly, need to disable and change to Observer-Only mode:

DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;

Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.




Failed.

DGMGRL>

Health Conditions

This is not a new feature for 19c, but help to reduce the scenarios where FSFO is triggered. It is possible to control the Health Conditions and disable/enable some errors like corrupted controlfiles or stuck archive. All options can be checked here at the documentation.

Look below at “Configurable Failover Conditions”, everything there can be set:

DGMGRL>  show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

Some examples

DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";

Succeeded.

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile           YES

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>




DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Corrupted Dictionary";

Succeeded.

DGMGRL> DISABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";

Succeeded.

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

Another option is to enable (or disable) for special condition/error from controlfile. The error ORA-240 can be set at trigger option:

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 240;

Succeeded.

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    ORA-240: control file enqueue held for more than %s seconds




DGMGRL> DISABLE FAST_START FAILOVER CONDITION 240;

Succeeded.

DGMGRL>

But just for ORA-240, other errors are not yet enabled:

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 600;

Error: ORA-16524: unsupported command, option, or argument




Failed.

DGMGRL>

Observe-Only and Conditions

The new feature Observe-Only mode for 19c is a good feature because it allows more control where and when the FSFO is triggered. Until this, the only option was ON or OFF. And in scenarios that you want to test, or even validate the environment before enable (for real) was impossible.

And if we combine this with Heath Condition check, it is a powerful control for the DG environment. It allows a better tune.

ZDLRA, ORDERING_WAIT task state

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

ZDLRA, ORDERING_WAIT task state

Tasks for ZDLRA are the pillar of how the backups are processed, everything is one task. So, when you ingest incremental backup one task is created but can occur that it get a freeze at ORDERING_WAIT state. These tasks are hard to identify and can create a big problem for your virtual full backup and backup strategy. Below I will show how they occur and how to solve the problem.

Incremental backups

To understand how they appear I need to show a little how the incremental backups work. Basically, look at the example below and one datafile with some blocks. When you do backup level 0, all the blocks are copied, and if I do backup incremental level 1 just the “new blocks” are copied (just blocks that changed).

As you can see above, the first backup of datafile (level 0), have SCN 3333. So, the consequent backup level 1 will copy everything that changed from that (and in this example have SCN 4444). And, the next incremental backup will pick up everything since the last backup, in this example, every block change since SCN 4444. So, it will generate a new backup with/until SCN 5555.

As you know, this is the definition of incremental backup. As you can see in the definition at “About RMAN Incremental Backups” from docs it is:

“An incremental backup copies only those data blocks that have changed since a previous backup. You can use RMAN to create incremental backups of data files, tablespaces, or the whole database.”

But the point it is that database/rman knows the scn from the last backup, and when does incremental backup it copy everything since from the last scn. Each incremental backup “knows” (internally, with the database blocks that are inside) the start scn and endpoint scn. So, to “reconstruct” the datafile, the database/rman uses the full backup and all subsequent incremental backups.

Incremental backup and ZDLRA

As I already wrote about incremental backups and ZDLRA, they are used to construct the “Virtual Full Backup”. In a very resumed way, ZDLRA merge the stored backups and create the virtual full backup (as I explained here too).

But even with this virtual backup, the way how the incremental backups work not change. The procedure is the same, check the scn from the last backup and copy all block change since that. As you can see in the image above, the full backups (blue in the image) are created merging the previous full with the ingested backup and are used as the base for the subsequent incremental.

ORDERING_WAIT

The ORDERING_WAIT occurs when the task INDEX_BACKUP that creates the index (and the virtual full backup) can’t finish because it doesn’t have all the required data. And this occurs because (by some reason) one backup is created and not stored at ZDLRA. And can be even a duplicate to create the standby (remember that basically the duplicate is rman backup copied from one side to another).

Look the image above, when after the SCN 4444 (that was the last backup stored at ZDLRA), another backup was taken and it is not inside of ZDLRA. So, when the new incremental backup is taken, it will copy all blocks changed from the last backup, but this last backup was the one that is not at ZDLRA (for rman side, it don’t care where it is the backup. By definition, incremental backup it is from the last backup, whatever or wherever it is).

And when this incremental backup it is ingested at ZDLRA, it will try to create the virtual full. But since the last stored backup have the SCN 4444, and the new incremental pickup blocks changed since SCN 5555 and go until the SCN 6666, ZDLRA knows that it is a gap when opens this ingested backup. ZDLRA doesn’t have the blocks that are between SCN 4444 and 5555 (look the yellow block, backup exists just outside of ZDLRA).

So, it is impossible to create the virtual full backup for SCN 6666, and the INDEX_BACKUP task will be at hold in state ORDERING_WAIT. To solve, there is two option, you can take a new level 0 backup or use BACKUP [CUMULATIVE] INCREMENTAL LEVEL 1 … FOR RECOVER OF TAG ‘<TAG>’ command. I will show you below how to do that.

How this occurs and how to solve

Bellow, I will show how you can identify and solve the problem. I will use the solution wrote in the last paragraph. But you can check the internal details of how occurs, to identify, and how to solve the issue.

In this scenario, I will use one database (number 12 – tablespace users for one PDB). So, first, check the backups for datafile 12:

RMAN> list backup of datafile 12 completed after "sysdate - 20/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24283   Incr 1  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24284   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282I   Media:

  List of Datafiles in backup set 24283

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24287   Incr 0  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24288   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282_12   Media:

  List of Datafiles in backup set 24287

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

As you can see, we have Incr 1 and Incr 0 levels with the same scn. And the name of the handle starts with VB$. This means that virtual full backup it is OK and created by ZDLRA.

And if I do a new incremental backup, ZDLRA generates a new virtual full backup:



RMAN> BACKUP INCREMENTAL LEVEL 1 DEVICE TYPE SBT FILESPERSET 1 DATAFILE 12;




Starting backup at 20/04/2020 23:51:58

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 20/04/2020 23:51:59

channel ORA_SBT_TAPE_1: finished piece 1 at 20/04/2020 23:52:02

piece handle=ORCL18C_a1uu5dsv_1_1 tag=TAG20200420T235158 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:03

Finished backup at 20/04/2020 23:52:02




Starting Control File and SPFILE Autobackup at 20/04/2020 23:52:02

piece handle=c-558466555-20200420-0b comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 20/04/2020 23:52:10




RMAN> list backup of datafile 12 completed after "sysdate - 20/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24283   Incr 1  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24284   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282I   Media:

  List of Datafiles in backup set 24283

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24287   Incr 0  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24288   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282_12   Media:

  List of Datafiles in backup set 24287

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24355   Incr 1  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24356   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354I   Media:

  List of Datafiles in backup set 24355

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24359   Incr 0  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24360   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354_12   Media:

  List of Datafiles in backup set 24359

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

Simulating the error

Just to remember that this new incremental have all scn from 2013573 until 2013810.

But by some reason one full backup it takes to another place (like disk, or duplicate for standby). Look that the channel type it is not ZDLRA:

RMAN> BACKUP INCREMENTAL LEVEL 0 DEVICE TYPE disk format '/tmp/%U' DATAFILE 12 TAG 'BKP-DBF-TO-DISK';




Starting backup at 20/04/2020 23:54:01

released channel: ORA_SBT_TAPE_1

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=65 device type=DISK

channel ORA_DISK_1: starting incremental level 0 datafile backup set

channel ORA_DISK_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_DISK_1: starting piece 1 at 20/04/2020 23:54:01

channel ORA_DISK_1: finished piece 1 at 20/04/2020 23:54:02

piece handle=/tmp/a3uu5e0p_1_1 tag=BKP-DBF-TO-DISK comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

Finished backup at 20/04/2020 23:54:02




Starting Control File and SPFILE Autobackup at 20/04/2020 23:54:02

piece handle=/u01/app/oracle/oradata/ORCL18C/autobackup/2020_04_20/o1_mf_s_1038268443_h9w6hwht_.bkp comment=NONE

Finished Control File and SPFILE Autobackup at 20/04/2020 23:54:10




RMAN> list backup tag = 'BKP-DBF-TO-DISK';







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24391   Incr 0  1.18M      DISK        00:00:00     20/04/2020 23:54:01

        BP Key: 24394   Status: AVAILABLE  Compressed: NO  Tag: BKP-DBF-TO-DISK

        Piece Name: /tmp/a3uu5e0p_1_1

  List of Datafiles in backup set 24391

  Container ID: 3, PDB Name: ORCL18P

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013972    20/04/2020 23:54:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

Now, if I try to execute the incremental level 1, the ingested backup does not generate a new virtual backup:

RMAN> BACKUP INCREMENTAL LEVEL 1 DEVICE TYPE SBT FILESPERSET 1 DATAFILE 12;




Starting backup at 20/04/2020 23:55:23

released channel: ORA_DISK_1

allocated channel: ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: SID=65 device type=SBT_TAPE

channel ORA_SBT_TAPE_1: RA Library (ZDLRAS1) SID=A3C0F4C16DAA11FAE053010310ACC1C4

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 20/04/2020 23:55:24

channel ORA_SBT_TAPE_1: finished piece 1 at 20/04/2020 23:55:27

piece handle=ORCL18C_a5uu5e3c_1_1 tag=TAG20200420T235524 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:03

Finished backup at 20/04/2020 23:55:27




Starting Control File and SPFILE Autobackup at 20/04/2020 23:55:27

piece handle=c-558466555-20200420-0d comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 20/04/2020 23:55:35




RMAN> list backup of datafile 12 completed after "sysdate - 20/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24283   Incr 1  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24284   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282I   Media:

  List of Datafiles in backup set 24283

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24287   Incr 0  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24288   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282_12   Media:

  List of Datafiles in backup set 24287

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24355   Incr 1  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24356   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354I   Media:

  List of Datafiles in backup set 24355

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24359   Incr 0  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24360   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354_12   Media:

  List of Datafiles in backup set 24359

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24391   Incr 0  1.18M      DISK        00:00:00     20/04/2020 23:54:01

        BP Key: 24394   Status: AVAILABLE  Compressed: NO  Tag: BKP-DBF-TO-DISK

        Piece Name: /tmp/a3uu5e0p_1_1

  List of Datafiles in backup set 24391

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013972    20/04/2020 23:54:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24430   Incr 1  256.00K    SBT_TAPE    00:00:00     20/04/2020 23:55:24

        BP Key: 24431   Status: AVAILABLE  Compressed: NO  Tag: TAG20200420T235524

        Handle: ORCL18C_a5uu5e3c_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24430

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2014089    20/04/2020 23:55:24              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

Look above that the last incremental backup not generated a new level 0, it is the backupset 24430 (and backuppiece 24431).

Task inside ZDLRA

If we enter inside of ZDLRA, we can see that the task responsible to index the backupiece 23286 is in state ORDEING_WAIT. You can use query over ra_task and check using the state column..

SQL> select TASK_ID, TASK_TYPE, STATE, WAITING_ON, DB_KEY, DB_UNIQUE_NAME, CREATION_TIME, ERROR_COUNT, INTERRUPT_COUNT, BP_KEY,BS_KEY,DF_KEY,VB_KEY from rasys.ra_task where db_unique_name = 'ORCL18C' and state = 'ORDERING_WAIT' order by 5,2,7,10,11,12,13;




   TASK_ID TASK_TYPE       STATE                     WAITING_ON     DB_KEY DB_UNIQUE_NAME                 CREATION_TIME                       ERROR_COUNT INTERRUPT_COUNT     BP_KEY     BS_KEY     DF_KEY     VB_KEY

---------- --------------- ------------------------- ---------- ---------- ------------------------------ ----------------------------------- ----------- --------------- ---------- ---------- ---------- ----------

     40203 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        20-APR-20 11.55.26.779191 PM +02:00           0               1      24431




SQL>

And there is no incident for this error or state:

SQL> select incident_id, error_text last_seen from ra_incident_log where db_unique_name = 'ORCL18C' and status not in ('FIXED', 'RESET') order by last_seen desc;




no rows selected




SQL>

Unfortunately, this occurs for ZDLRA, even if there is a task in the ORDEING_WAIT state, it is not reported as an error. And if you think about, the virtual full backup it is not generated and the feature that it is the virtual full backup is not in place for these datafiles.

But even with tasks in this state, you will not be unprotected, the backup is there and can be restored.

ZDLRA Internals

If we check inside of ZDLRA we can check more details (this is one example how to navigate through internal tables of ZDLRA rman catalog and find the information – same that you find with list backupset inside rman):

SQL> select bs_key, db_key, pdb_key from bp where bp_key = 24431;




    BS_KEY     DB_KEY    PDB_KEY

---------- ---------- ----------

     24430      15993      16000




SQL> select * from rc_database where db_key = 15993;




    DB_KEY  DBINC_KEY       DBID NAME     RESETLOGS_CHANGE# RESETLOGS FINAL_CHANGE#

---------- ---------- ---------- -------- ----------------- --------- -------------

     15993      15994  558466555 ORCL18C            1477662 11-AUG-19




SQL> select df_key from df where dbinc_key = 15994 and  file# = 12;




    DF_KEY

----------

     16026




SQL> select bdf_key, ckp_scn from bdf where bs_key = 24430;




   BDF_KEY    CKP_SCN

---------- ----------

     24432    2014089




SQL>

###################################################################

RMAN> list backupset 24430;







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24430   Incr 1  256.00K    SBT_TAPE    00:00:00     20/04/2020 23:55:24

        BP Key: 24431   Status: AVAILABLE  Compressed: NO  Tag: TAG20200420T235524

        Handle: ORCL18C_a5uu5e3c_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24430

  Container ID: 3, PDB Name: ORCL18P

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2014089    20/04/2020 23:55:24              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

Using the BP table we can find the DB_KEY, and with that, we can go to RC_DATABASE to get the incarnation key (DBINC_KEY). With that, at DF table we can discover the DF_KEY for the datafile 12 and for this database incarnation. And with the DBF table, we can find the CKP_SCN for this datafile.

But if we go to the BLOCKS tables (ZDLRA table that stores the index – and “more or less” the virtual full), there are no database blocks with the SCN 2014089 for the backupset 24430. If you want to understand who it works, you can read my post about this.

SQL> select * from blocks where df_key = 16026 and scn >= 2014089;




no rows selected




SQL>

And if we check for SCN 2013972 (there came from backup was put in disk), nothing too (as expected):

SQL> select * from blocks where df_key = 16026 and scn >= 2013972;




no rows selected




SQL>

As if we check with the last SCN 2013810 that are know by ZDLRA (last virtual full backup, backupset 24359), we can see which blocks are there:

SQL> select * from blocks where df_key = 16026 and scn >= 2013810 order by scn, chunkno ;




    DF_KEY    BLOCKNO        SCN     CKP_ID    CHUNKNO    COFFSET       USED  DBINC_KEY     ENDBLK

---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------

     16026          1    2013810    2013810      25601      33121        338      15994

     16026          0    2013810    2013810      25601       8192      24576      15994




SQL>

And there is just PLANS for this virtual backup (nothing generated for the next existing backups):

SQL> select VB_KEY, DF_KEY, CKP_SCN, SRCBP_KEY, VCBP_KEY from vbdf where CKP_SCN >= 2013810 and df_key = 16026;




    VB_KEY     DF_KEY    CKP_SCN  SRCBP_KEY   VCBP_KEY

---------- ---------- ---------- ---------- ----------

     24354      16026    2013810      24293      24356




SQL> select * from plans_details where VB_KEY = 24354;




    DF_KEY       TYPE     VB_KEY    BLKRANK    BLOCKNO    CHUNKNO    NUMBLKS    COFFSET   NUMBYTES

---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------

     16026          1      24354          1          0      25601          1       8192      24576

     16026          1      24354          1          2      16385          2      32788        256

     16026          1      24354          1          4      14337        132      33045      14000

     16026          1      24354          1        136      16385          3      33044        553

     16026          1      24354          1        139      23553          1      32788        327

     16026          1      24354          1        140      16385          2      33786        264

     16026          1      24354          1        142      25601          1      32788        333

     16026          1      24354          1        143      16385          1      34182        132

     16026          1      24354          1        191      14337          3      47045        252

     16026          1      24354          1 4294967295      25601          1      33121        338




10 rows selected.




SQL>

As you can figure out, ZDLRA can’t fill the gap to create the virtual full backup.

Recurring error

If you not solve the problem, and continue to ingest backups, the task will remain in ORDERING_WAIT:

RMAN> BACKUP INCREMENTAL LEVEL 1 DEVICE TYPE SBT FILESPERSET 1 DATAFILE 12;




Starting backup at 21/04/2020 00:04:00

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 21/04/2020 00:04:01

channel ORA_SBT_TAPE_1: finished piece 1 at 21/04/2020 00:04:04

piece handle=ORCL18C_a7uu5ejh_1_1 tag=TAG20200421T000400 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:03

Finished backup at 21/04/2020 00:04:04




Starting Control File and SPFILE Autobackup at 21/04/2020 00:04:04

piece handle=c-558466555-20200421-00 comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 21/04/2020 00:04:13




RMAN> list backup of datafile 12 completed after "sysdate - 5/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24511   Incr 1  256.00K    SBT_TAPE    00:00:01     21/04/2020 00:04:02

        BP Key: 24512   Status: AVAILABLE  Compressed: NO  Tag: TAG20200421T000400

        Handle: ORCL18C_a7uu5ejh_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24511

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2021757    21/04/2020 00:04:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

####################################

SQL> select TASK_ID, TASK_TYPE, STATE, WAITING_ON, DB_KEY, DB_UNIQUE_NAME, CREATION_TIME, ERROR_COUNT, INTERRUPT_COUNT, BP_KEY,BS_KEY,DF_KEY,VB_KEY from rasys.ra_task where db_unique_name = 'ORCL18C' and state = 'ORDERING_WAIT' order by 5,2,7,10,11,12,13;




   TASK_ID TASK_TYPE       STATE                     WAITING_ON     DB_KEY DB_UNIQUE_NAME                 CREATION_TIME                       ERROR_COUNT INTERRUPT_COUNT     BP_KEY     BS_KEY     DF_KEY     VB_KEY

---------- --------------- ------------------------- ---------- ---------- ------------------------------ ----------------------------------- ----------- --------------- ---------- ---------- ---------- ----------

     40203 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        20-APR-20 11.55.26.779191 PM +02:00           0               1      24431

     40210 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        21-APR-20 12.04.05.207342 AM +02:00           0               1      24512




SQL>

Even if you try to do a cumulative incremental backup, the problem will be the same:

RMAN> BACKUP CUMULATIVE INCREMENTAL LEVEL 1 DEVICE TYPE SBT FILESPERSET 1 DATAFILE 12;




Starting backup at 21/04/2020 00:07:13

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 21/04/2020 00:07:13

channel ORA_SBT_TAPE_1: finished piece 1 at 21/04/2020 00:07:20

piece handle=ORCL18C_a9uu5eph_1_1 tag=TAG20200421T000713 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:07

Finished backup at 21/04/2020 00:07:20




Starting Control File and SPFILE Autobackup at 21/04/2020 00:07:20

piece handle=c-558466555-20200421-01 comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 21/04/2020 00:07:28




RMAN> list backup of datafile 12 completed after "sysdate - 2/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24604   Incr 1  256.00K    SBT_TAPE    00:00:02     21/04/2020 00:07:15

        BP Key: 24605   Status: AVAILABLE  Compressed: NO  Tag: TAG20200421T000713

        Handle: ORCL18C_a9uu5eph_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24604

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2021959    21/04/2020 00:07:13              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

#############################################

SQL> select TASK_ID, TASK_TYPE, STATE, WAITING_ON, DB_KEY, DB_UNIQUE_NAME, CREATION_TIME, ERROR_COUNT, INTERRUPT_COUNT, BP_KEY,BS_KEY,DF_KEY,VB_KEY from rasys.ra_task where db_unique_name = 'ORCL18C' and state = 'ORDERING_WAIT' order by 5,2,7,10,11,12,13;




   TASK_ID TASK_TYPE       STATE                     WAITING_ON     DB_KEY DB_UNIQUE_NAME                 CREATION_TIME                       ERROR_COUNT INTERRUPT_COUNT     BP_KEY     BS_KEY     DF_KEY     VB_KEY

---------- --------------- ------------------------- ---------- ---------- ------------------------------ ----------------------------------- ----------- --------------- ---------- ---------- ---------- ----------

     40203 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        20-APR-20 11.55.26.779191 PM +02:00           0               1      24431

     40210 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        21-APR-20 12.04.05.207342 AM +02:00           0               1      24512

     40215 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        21-APR-20 12.07.18.140618 AM +02:00           0               1      24605




SQL>

Solving ORDERING_WAIT

There are two ways to solve the issue, for both the idea is the same: ingest the database blocks inside of ZDLRA. And we do this performing backup.

FOR RECOVER OF TAG

The first is to use the command “BACKUP [CUMULATIVE] INCREMENTAL LEVEL 1 … FOR RECOVER OF TAG ‘<TAG>’”. The idea here is to create one incremental backup that recovers since one specific tag. You can check the documentation for more details if you want, it exists since Oracle 10g.

As you can imagine, the critical point here is to define the correct TAG do be used as a reference. And in this case, the tag is the last full backup (virtual or no) that it is inside of ZDLRA. Doing this, we ingest all changed blocks and fill the gap that is holding the task.

In this case, I used normal incremental. Look that the tag is from the last virtual full backup that is inside of ZDLRA for this datafile:

RMAN> BACKUP INCREMENTAL LEVEL 1 DEVICE TYPE SBT FOR RECOVER OF TAG 'TAG20200420T235158' DATAFILE 12;




Starting backup at 21/04/2020 00:12:17

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 21/04/2020 00:12:17

channel ORA_SBT_TAPE_1: finished piece 1 at 21/04/2020 00:12:20

piece handle=ORCL18C_abuu5f31_1_1 tag=TAG20200420T235158 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:03

Finished backup at 21/04/2020 00:12:20




Starting Control File and SPFILE Autobackup at 21/04/2020 00:12:20

piece handle=c-558466555-20200421-02 comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 21/04/2020 00:12:28




RMAN>

And we can see that a new virtual full backup was created with this incremental backup (look the last 5 backupsets):

RMAN> list backup of datafile 12 completed after "sysdate - 40/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24283   Incr 1  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24284   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282I   Media:

  List of Datafiles in backup set 24283

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24287   Incr 0  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24288   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282_12   Media:

  List of Datafiles in backup set 24287

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24355   Incr 1  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24356   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354I   Media:

  List of Datafiles in backup set 24355

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24359   Incr 0  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24360   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354_12   Media:

  List of Datafiles in backup set 24359

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24391   Incr 0  1.18M      DISK        00:00:00     20/04/2020 23:54:01

        BP Key: 24394   Status: AVAILABLE  Compressed: NO  Tag: BKP-DBF-TO-DISK

        Piece Name: /tmp/a3uu5e0p_1_1

  List of Datafiles in backup set 24391

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013972    20/04/2020 23:54:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24430   Incr 1  256.00K    SBT_TAPE    00:00:00     20/04/2020 23:55:24

        BP Key: 24431   Status: AVAILABLE  Compressed: NO  Tag: TAG20200420T235524

        Handle: ORCL18C_a5uu5e3c_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24430

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2014089    20/04/2020 23:55:24              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24511   Incr 1  256.00K    SBT_TAPE    00:00:01     21/04/2020 00:04:02

        BP Key: 24512   Status: AVAILABLE  Compressed: NO  Tag: TAG20200421T000400

        Handle: ORCL18C_a7uu5ejh_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24511

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2021757    21/04/2020 00:04:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24604   Incr 1  256.00K    SBT_TAPE    00:00:02     21/04/2020 00:07:15

        BP Key: 24605   Status: AVAILABLE  Compressed: NO  Tag: TAG20200421T000713

        Handle: ORCL18C_a9uu5eph_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24604

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2021959    21/04/2020 00:07:13              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24814   Incr 1  40.00K     SBT_TAPE    00:00:03     21/04/2020 00:12:20

        BP Key: 24815   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24813I   Media:

  List of Datafiles in backup set 24814

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2025613    21/04/2020 00:12:17              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24818   Incr 0  40.00K     SBT_TAPE    00:00:03     21/04/2020 00:12:20

        BP Key: 24819   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24813_12   Media:

  List of Datafiles in backup set 24818

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2025613    21/04/2020 00:12:17              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

And inside of ZDLRA, we can see that task previously in ORDERING_WAIT finished (check the BP_KEY column):

SQL> select TASK_ID, TASK_TYPE, STATE, WAITING_ON, DB_KEY, DB_UNIQUE_NAME, CREATION_TIME, COMPLETION_TIME, ERROR_COUNT, INTERRUPT_COUNT, BP_KEY,BS_KEY,DF_KEY,VB_KEY from rasys.ra_task where task_id IN (40203,40210,40215);




   TASK_ID TASK_TYPE       STATE                     WAITING_ON     DB_KEY DB_UNIQUE_NAME                 CREATION_TIME                       COMPLETION_TIME                     ERROR_COUNT INTERRUPT_COUNT     BP_KEY     BS_KEY     DF_KEY     VB_KEY

---------- --------------- ------------------------- ---------- ---------- ------------------------------ ----------------------------------- ----------------------------------- ----------- --------------- ---------- ---------- ---------- ----------

     40203 INDEX_BACKUP    COMPLETED                                 15993 ORCL18C                        20-APR-20 11.55.26.779191 PM +02:00 21-APR-20 12.13.09.253916 AM +02:00           0               1      24431

     40210 INDEX_BACKUP    COMPLETED                                 15993 ORCL18C                        21-APR-20 12.04.05.207342 AM +02:00 21-APR-20 12.13.21.663869 AM +02:00           0               1      24512

     40215 INDEX_BACKUP    COMPLETED                                 15993 ORCL18C                        21-APR-20 12.07.18.140618 AM +02:00 21-APR-20 12.13.42.179087 AM +02:00           0               1      24605




SQL>

And if we check for the BLOCKS for this datafile, we can see that was registered new that are higher with the last full backup before the error, and they go until the last backup made

SQL> select * from blocks where df_key = 16026 and scn >= 2013810 order by scn, chunkno ;




    DF_KEY    BLOCKNO        SCN     CKP_ID    CHUNKNO    COFFSET       USED  DBINC_KEY     ENDBLK

---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------

     16026          0    2013810    2013810      25601       8192      24576      15994

     16026          1    2013810    2013810      25601      33121        338      15994

     16026        142    2021942    2025613      26625      32788        414      15994

     16026          1    2025613    2025613      26625      33202        336      15994

     16026          0    2025613    2025613      26625       8192      24576      15994




SQL>

And we can see that now exists PLANS for the backupset of virtual full backup that exists before the error (SCN 2013810) and after we fix (SCN 2025613):

SQL> select VB_KEY, DF_KEY, CKP_SCN, SRCBP_KEY, VCBP_KEY from vbdf where CKP_SCN >= 2013810 and DF_KEY = 16026;




    VB_KEY     DF_KEY    CKP_SCN  SRCBP_KEY   VCBP_KEY

---------- ---------- ---------- ---------- ----------

     24354      16026    2013810      24293      24356

     24813      16026    2025613      24706      24815




SQL>

SQL> select * from plans_details where VB_KEY IN (24354,24813) order by VB_KEY,BLOCKNO;




    DF_KEY       TYPE     VB_KEY    BLKRANK    BLOCKNO    CHUNKNO    NUMBLKS    COFFSET   NUMBYTES

---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------

     16026          1      24354          1          0      25601          1       8192      24576

     16026          1      24354          1          2      16385          2      32788        256

     16026          1      24354          1          4      14337        132      33045      14000

     16026          1      24354          1        136      16385          3      33044        553

     16026          1      24354          1        139      23553          1      32788        327

     16026          1      24354          1        140      16385          2      33786        264

     16026          1      24354          1        142      25601          1      32788        333

     16026          1      24354          1        143      16385          1      34182        132

     16026          1      24354          1        191      14337          3      47045        252

     16026          1      24354          1 4294967295      25601          1      33121        338

     16026          1      24813          1          0      26625          1       8192      24576

     16026          1      24813          1          2      16385          2      32788        256

     16026          1      24813          1          4      14337        132      33045      14000

     16026          1      24813          1        136      16385          3      33044        553

     16026          1      24813          1        139      23553          1      32788        327

     16026          1      24813          1        140      16385          2      33786        264

     16026          1      24813          1        142      26625          1      32788        414

     16026          1      24813          1        143      16385          1      34182        132

     16026          1      24813          1        191      14337          3      47045        252

     16026          1      24813          1 4294967295      26625          1      33202        336




20 rows selected.




SQL>

So, this means that the incremental backup that we made with FOR RECOVERY OF TAG was ingested and used to fix the needed gap.

And if we try to recover the datafile 12, we can do without a problem. Check that the used backup to recover was the last virtual full backup generated from the “RECOVERY OF TAG” command:

RMAN> run{

2> ALTER PLUGGABLE DATABASE ORCL18P CLOSE IMMEDIATE INSTANCES=ALL;

3> RESTORE DATAFILE 12;

4> RECOVER DATAFILE 12;

5> ALTER PLUGGABLE DATABASE ORCL18P OPEN INSTANCES=ALL;

6> }




Statement processed

starting full resync of recovery catalog

full resync complete




Starting restore at 21/04/2020 00:57:53

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=88 device type=DISK

allocated channel: ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: SID=70 device type=SBT_TAPE

channel ORA_SBT_TAPE_1: RA Library (ZDLRAS1) SID=A3C1D44670B428B0E053010310AC5DA9




channel ORA_SBT_TAPE_1: starting datafile backup set restore

channel ORA_SBT_TAPE_1: specifying datafile(s) to restore from backup set

channel ORA_SBT_TAPE_1: restoring datafile 00012 to /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: reading from backup piece VB$_1891149551_24813_12

channel ORA_SBT_TAPE_1: piece handle=VB$_1891149551_24813_12 tag=TAG20200420T235158

channel ORA_SBT_TAPE_1: restored backup piece 1

channel ORA_SBT_TAPE_1: restore complete, elapsed time: 00:00:25

Finished restore at 21/04/2020 00:58:21




Starting recover at 21/04/2020 00:58:21

using channel ORA_DISK_1

using channel ORA_SBT_TAPE_1




starting media recovery

media recovery complete, elapsed time: 00:00:00




Finished recover at 21/04/2020 00:58:22




Statement processed

starting full resync of recovery catalog

full resync complete




RMAN>

Unfortunately, backup tags for ZDLRA can be tricky when you directly specify it during the backup phase. They can be the same and the usage “FOR TAG” can be more difficult to define. One option is to merge and execute the command BACKUP CUMULATIVE INCREMENTAL LEVEL 1 DEVICE TYPE SBT FOR RECOVER OF TAG ‘<TAG>’ DATAFILE XX.

Doing this, the command will pick up all the blocks from the last full backup that have the tag that you defined. The result is the same:

RMAN> BACKUP CUMULATIVE INCREMENTAL LEVEL 1 DEVICE TYPE SBT FOR RECOVER OF TAG 'TAG20200419T232006' DATAFILE 12;




Starting backup at 19/04/2020 23:38:44

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 19/04/2020 23:38:44

channel ORA_SBT_TAPE_1: finished piece 1 at 19/04/2020 23:38:59

piece handle=ORCL18C_97uu2oo4_1_1 tag=TAG20200419T232006 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:15

Finished backup at 19/04/2020 23:38:59




Starting Control File and SPFILE Autobackup at 19/04/2020 23:39:00

piece handle=c-558466555-20200419-04 comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 19/04/2020 23:39:16




RMAN>

BACKUP FULL

The other option to solve the ORDERING_WAIT is to do a full backup of the datafile. With this, all the blocks are read and ingested at ZDLRA.

The procedure is the same as above and the result the same. The only point is that for huge files this can take a long time (since it is full) and the above incremental approach can be more suitable.

Monitoring the Tasks

So, as you can see above, the ORDERING_WAIT state can have a lot of collateral effects for you ZDLRA. Unfortunately, this not generates incidents that are reported, you need to write a query to check this directly at ra_task table.

Whatever the method that you choose to solve the problem (RECOVERY OF TAG, or FULL BACKUP) always verify if the new virtual full backup is generated. It is a good practice to do this to avoid errors and do a double cross-check over the tasks

It is a simple query, and a simple monitoring thing to do. But this will avoid a huge problem.

Reference:

Implementing a Dual Backup Strategy with Backups to Tape and Recovery Appliance (Doc ID 2154471.1)

How to upgrade ODA Patch: 18.8.0.0 for Virtualized System

Category: Engineer System Author: Andre Luiz Dutra Ontalba (Board Member) Date: 5 years ago Comments: 0

How to upgrade ODA Patch: 18.8.0.0 for Virtualized System

Introduction

The goal of this document is to describe step by step how to upgrade ODA Virtualized System.

Prerequisites

For this Upgrade the ODA must be at version 18.3

Oracle Database Appliance Documentation (Check the last version of the patch)

Start Upgrade

1 – Backup of ODA_BASE (Both Nodes): It can take up to 2h00 hours.

From DOM0:

As root user:

Node 1:

oakcli stop oda_base

mkdir -p /backup/odax58duts1/odax58duts1_date +"%Y%m%d"

nohup tar -cvzf oakDom1.odax58duts1_dom0.tar.gz /OVS/Repositories/odabaseRepo/VirtualMachines/oakDom1 &

After complete the backup:

oakcli start oda_base

Node 2:

oakcli stop oda_base

mkdir -p /backup/odax58duts2/odax58duts2_date +"%Y%m%d"

nohup tar -cvzf oakDom1.odax58duts2_dom0.tar.gz /OVS/Repositories/odabaseRepo/VirtualMachines/oakDom1 &

After complete the backup:

oakcli start oda_base

2 – Download patch in a shared directory or separated directory in both ODA servers:

Os User: root

mkdir -p /backup/patchODA2020

Download all required files to this directory:

/backup/patchODA2020

There is 2 .zip files to be download patch (30518438):

p30518438_188000_Linux-x86-64_1of2.zip

p30518438_188000_Linux-x86-64_2of2.zip

Note: We must guarantee some minimum amount of free space (20GB) in the ODA BASE, file systems: “/” and “/u01“

– Purge old logs from ODA, with root user:

oakcli manage cleanrepo --ver 18.3.0.0.0

oakcli manage cleanrepo --ver 18.6.0.0.0

/usr/local/bin/purgeODALog -orcl 20 -tfa 10 -osw 10 -oak 10

– Clean up old patches from GRID_HOME, affects “/u01“:

su - grid

. oraenv

+ASM1 or +ASM2

cd $ORACLE_HOME/OPatch

./opatch util cleanup

– Clean up old patches from ORACLE_HOME, affects “/u01“:

su - oracle
. oraenv <SID>
cd $ORACLE_HOME/OPatch
./opatch util cleanup

Note:
– It must be performed this clean in every ORACLE_HOME versions that exists in the server.
– Look at “/home/oracle“, “/home/grid” and “/tmp” to perform some clean up and release some space in the file system “/“.

3 – Unpack downloaded patch in both ODA nodes:

cd /backup/patchODA2020
oakcli unpack -package p30518438_188000_Linux-x86-64_1of2.zip
oakcli unpack -package p30518438_188000_Linux-x86-64_2of2.zip

4 – Verify and Validate Upgrade ODA components (S.O):

export EXTRA_OS_RPMS_LOC=/backup/patchODA2020

oakcli validate -c ospatch -ver 18.8.0.0.0

Note:

Solving rpm’s conflict for version 18.8.0.0.0.

e.g: With errors and conflicts

BEGIN OUTPUT:

NODE 1:

[root@odax58duts1 patchODA2020]# oakcli validate -c ospatch -ver 18.8.0.0.0
INFO: Validating the OS patch for the version 18.8.0.0.0
INFO: 2020-06-06 14:06:16: Performing a dry run for OS patching
ERROR: 2020-06-06 14:06:31: Unable to run the command : /usr/bin/yum --exclude=kmod-mpt2sas,ibutils-libs,dapl,libcxgb3,libipathverbs,libmthca,libnes,ofed-docs update --disablerepo=* --enablerepo=ODA_REPOS_LOC -y
ERROR: 2020-06-06 14:06:31: Loaded plugins: rhnplugin, ulninfo, versionlock
This system is not registered with ULN.
You can use uln_register to register.
ULN support will be disabled.
Repository ol6_latest is listed more than once in the configuration
Repository ol6_addons is listed more than once in the configuration
Repository ol6_UEK_latest is listed more than once in the configuration
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package cpupowerutils.x86_64 0:1.3-2.el6 will be updated
---> Package cpupowerutils.x86_64 0:1.3-2.0.1.el6 will be an update
---> Package cups-libs.x86_64 1:1.4.2-79.el6 will be updated
---> Package cups-libs.x86_64 1:1.4.2-81.el6_10 will be an update
---> Package dbus.x86_64 1:1.2.24-9.0.1.el6 will be updated
---> Package dbus.x86_64 1:1.2.24-11.0.1.el6_10 will be an update
---> Package dbus-libs.x86_64 1:1.2.24-9.0.1.el6 will be updated
---> Package dbus-libs.x86_64 1:1.2.24-11.0.1.el6_10 will be an update
---> Package dracut.noarch 0:004-411.0.3.el6 will be updated
---> Package dracut.noarch 0:004-411.0.4.el6 will be an update
---> Package dracut-kernel.noarch 0:004-411.0.3.el6 will be updated
---> Package dracut-kernel.noarch 0:004-411.0.4.el6 will be an update
---> Package glibc.x86_64 0:2.12-1.212.0.2.el6 will be updated
---> Package glibc.x86_64 0:2.12-1.212.0.3.el6_10.3 will be an update
---> Package glibc-common.x86_64 0:2.12-1.212.0.2.el6 will be updated
---> Package glibc-common.x86_64 0:2.12-1.212.0.3.el6_10.3 will be an update
---> Package glibc-devel.i686 0:2.12-1.212.0.2.el6 will be updated
---> Package glibc-devel.x86_64 0:2.12-1.212.0.2.el6 will be updated
---> Package glibc-devel.i686 0:2.12-1.212.0.3.el6_10.3 will be an update
---> Package glibc-devel.x86_64 0:2.12-1.212.0.3.el6_10.3 will be an update
---> Package glibc-headers.x86_64 0:2.12-1.212.0.2.el6 will be updated
---> Package glibc-headers.x86_64 0:2.12-1.212.0.3.el6_10.3 will be an update
---> Package initscripts.x86_64 0:9.03.61-1.0.3.el6 will be updated
---> Package initscripts.x86_64 0:9.03.61-1.0.6.el6 will be an update
---> Package kernel-headers.x86_64 0:2.6.32-754.11.1.el6 will be updated
---> Package kernel-headers.x86_64 0:2.6.32-754.18.2.el6 will be an update
---> Package kernel-uek.x86_64 0:4.1.12-124.33.4.el6uek will be installed
---> Package kernel-uek-firmware.noarch 0:4.1.12-124.33.4.el6uek will be installed
---> Package kexec-tools.x86_64 0:2.0.7-1.0.27.el6 will be updated
---> Package kexec-tools.x86_64 0:2.0.7-1.0.28.el6 will be an update
---> Package ksplice.x86_64 0:1.0.38-1.el6 will be updated
---> Package ksplice.x86_64 0:1.0.43-1.el6 will be an update
---> Package ksplice-core0.x86_64 0:1.0.38-1.el6 will be updated
---> Package ksplice-core0.x86_64 0:1.0.43-1.el6 will be an update
---> Package ksplice-offline.x86_64 0:1.0.38-1.el6 will be updated
---> Package ksplice-offline.x86_64 0:1.0.43-1.el6 will be an update
---> Package ksplice-tools.x86_64 0:1.0.38-1.el6 will be updated
---> Package ksplice-tools.x86_64 0:1.0.43-1.el6 will be an update
---> Package libgudev1.x86_64 0:147-2.73.0.1.el6_8.2 will be updated
---> Package libgudev1.x86_64 0:147-2.73.0.2.el6_8.2 will be an update
---> Package libudev.x86_64 0:147-2.73.0.1.el6_8.2 will be updated
---> Package libudev.x86_64 0:147-2.73.0.2.el6_8.2 will be an update
---> Package mailx.x86_64 0:12.4-8.el6_6 will be updated
---> Package mailx.x86_64 0:12.4-10.el6_10 will be an update
---> Package openssl.x86_64 0:1.0.1e-57.0.6.el6 will be updated
---> Package openssl.x86_64 0:1.0.1e-58.0.1.el6_10 will be an update
---> Package oracle-ofed-release.x86_64 0:1.0.0-50.el6 will be updated
---> Package oracle-ofed-release.x86_64 0:1.0.0-51.el6 will be an update
---> Package perf.x86_64 0:2.6.32-754.11.1.el6 will be updated
---> Package perf.x86_64 0:2.6.32-754.18.2.el6 will be an update
---> Package python.x86_64 0:2.6.6-66.0.1.el6_8 will be updated
---> Package python.x86_64 0:2.6.6-68.0.1.el6_10 will be an update
---> Package python-libs.x86_64 0:2.6.6-66.0.1.el6_8 will be updated
---> Package python-libs.x86_64 0:2.6.6-68.0.1.el6_10 will be an update
---> Package rdma.noarch 2:3.10-3.0.40.el6 will be updated
---> Package rdma.noarch 2:3.10-3.0.41.el6 will be an update
---> Package samba.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package samba-client.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba-client.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package samba-common.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba-common.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package samba-winbind.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba-winbind.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package samba-winbind-clients.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba-winbind-clients.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package sudo.x86_64 0:1.8.6p3-29.el6_9 will be updated
---> Package sudo.x86_64 0:1.8.6p3-29.0.1.el6_10.2 will be an update
---> Package tzdata.noarch 0:2019a-1.el6 will be updated
---> Package tzdata.noarch 0:2019c-1.el6 will be an update
---> Package tzdata-java.noarch 0:2018e-3.el6 will be updated
---> Package tzdata-java.noarch 0:2019c-1.el6 will be an update
---> Package udev.x86_64 0:147-2.73.0.1.el6_8.2 will be updated
---> Package udev.x86_64 0:147-2.73.0.2.el6_8.2 will be an update
---> Package vim-common.x86_64 2:7.4.629-5.el6_8.1 will be updated
---> Package vim-common.x86_64 2:7.4.629-5.el6_10.2 will be an update
---> Package vim-enhanced.x86_64 2:7.4.629-5.el6_8.1 will be updated
---> Package vim-enhanced.x86_64 2:7.4.629-5.el6_10.2 will be an update
---> Package vim-minimal.x86_64 2:7.4.629-5.el6_8.1 will be updated
---> Package vim-minimal.x86_64 2:7.4.629-5.el6_10.2 will be an update
---> Package xorg-x11-server-Xorg.x86_64 0:1.17.4-17.0.1.el6 will be updated
---> Package xorg-x11-server-Xorg.x86_64 0:1.17.4-17.0.2.el6 will be an update
---> Package xorg-x11-server-common.x86_64 0:1.17.4-17.0.1.el6 will be updated
---> Package xorg-x11-server-common.x86_64 0:1.17.4-17.0.2.el6 will be an update
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
 Package                 Arch    Version                   Repository      Size
================================================================================
Installing:
 kernel-uek              x86_64  4.1.12-124.33.4.el6uek    ODA_REPOS_LOC   42 M
 kernel-uek-firmware     noarch  4.1.12-124.33.4.el6uek    ODA_REPOS_LOC  2.6 M
Updating:
 cpupowerutils           x86_64  1.3-2.0.1.el6             ODA_REPOS_LOC   77 k
 cups-libs               x86_64  1:1.4.2-81.el6_10         ODA_REPOS_LOC  322 k
 dbus                    x86_64  1:1.2.24-11.0.1.el6_10    ODA_REPOS_LOC  211 k
 dbus-libs               x86_64  1:1.2.24-11.0.1.el6_10    ODA_REPOS_LOC  127 k
 dracut                  noarch  004-411.0.4.el6           ODA_REPOS_LOC  129 k
 dracut-kernel           noarch  004-411.0.4.el6           ODA_REPOS_LOC   29 k
 glibc                   x86_64  2.12-1.212.0.3.el6_10.3   ODA_REPOS_LOC  3.8 M
 glibc-common            x86_64  2.12-1.212.0.3.el6_10.3   ODA_REPOS_LOC   14 M
 glibc-devel             i686    2.12-1.212.0.3.el6_10.3   ODA_REPOS_LOC  992 k
 glibc-devel             x86_64  2.12-1.212.0.3.el6_10.3   ODA_REPOS_LOC  991 k
 glibc-headers           x86_64  2.12-1.212.0.3.el6_10.3   ODA_REPOS_LOC  620 k
 initscripts             x86_64  9.03.61-1.0.6.el6         ODA_REPOS_LOC  952 k
 kernel-headers          x86_64  2.6.32-754.18.2.el6       ODA_REPOS_LOC  4.6 M
 kexec-tools             x86_64  2.0.7-1.0.28.el6          ODA_REPOS_LOC  339 k
 ksplice                 x86_64  1.0.43-1.el6              ODA_REPOS_LOC  9.1 k
 ksplice-core0           x86_64  1.0.43-1.el6              ODA_REPOS_LOC  271 k
 ksplice-offline         x86_64  1.0.43-1.el6              ODA_REPOS_LOC  7.9 k
 ksplice-tools           x86_64  1.0.43-1.el6              ODA_REPOS_LOC   92 k
 libgudev1               x86_64  147-2.73.0.2.el6_8.2      ODA_REPOS_LOC   65 k
 libudev                 x86_64  147-2.73.0.2.el6_8.2      ODA_REPOS_LOC   78 k
 mailx                   x86_64  12.4-10.el6_10            ODA_REPOS_LOC  235 k
 openssl                 x86_64  1.0.1e-58.0.1.el6_10      ODA_REPOS_LOC  1.5 M
 oracle-ofed-release     x86_64  1.0.0-51.el6              ODA_REPOS_LOC   16 k
 perf                    x86_64  2.6.32-754.18.2.el6       ODA_REPOS_LOC  4.8 M
 python                  x86_64  2.6.6-68.0.1.el6_10       ODA_REPOS_LOC   76 k
 python-libs             x86_64  2.6.6-68.0.1.el6_10       ODA_REPOS_LOC  5.3 M
 rdma                    noarch  2:3.10-3.0.41.el6         ODA_REPOS_LOC   76 k
 samba                   x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC  5.1 M
 samba-client            x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC   11 M
 samba-common            x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC   10 M
 samba-winbind           x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC  2.2 M
 samba-winbind-clients   x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC  2.0 M
 sudo                    x86_64  1.8.6p3-29.0.1.el6_10.2   ODA_REPOS_LOC  712 k
 tzdata                  noarch  2019c-1.el6               ODA_REPOS_LOC  507 k
 tzdata-java             noarch  2019c-1.el6               ODA_REPOS_LOC  188 k
 udev                    x86_64  147-2.73.0.2.el6_8.2      ODA_REPOS_LOC  360 k
 vim-common              x86_64  2:7.4.629-5.el6_10.2      ODA_REPOS_LOC  6.7 M
 vim-enhanced            x86_64  2:7.4.629-5.el6_10.2      ODA_REPOS_LOC  1.0 M
 vim-minimal             x86_64  2:7.4.629-5.el6_10.2      ODA_REPOS_LOC  421 k
 xorg-x11-server-Xorg    x86_64  1.17.4-17.0.2.el6         ODA_REPOS_LOC  1.4 M
 xorg-x11-server-common  x86_64  1.17.4-17.0.2.el6         ODA_REPOS_LOC   51 k
Transaction Summary
================================================================================
Install       2 Package(s)
Upgrade      41 Package(s)
Total download size: 126 M
Downloading Packages:
--------------------------------------------------------------------------------
Total                                           190 MB/s | 126 MB     00:00
Running rpm_check_debug
Running Transaction Test
Transaction Check Error:
  file /usr/bin/ldd from install of glibc-common-2.12-1.212.0.3.el6_10.3.x86_64 conflicts with file from package glibc-common-2.12-1.212.0.3.el6_10.3.i686
  file /usr/lib/locale/locale-archive.tmpl from install of glibc-common-2.12-1.212.0.3.el6_10.3.x86_64 conflicts with file from package glibc-common-2.12-1.212.0.3.el6_10.3.i686
Error Summary
-------------
WARNING: 2020-06-06 14:06:31: OS Upgrade is not successful. Need to resolve conflicts
INFO: 2020-06-06 14:06:31: Copy the required RPMs to a location and set EXTRA_OS_RPMS_LOC to that location

Here we need to solve the dependency problem, in this case we will remove the package

[root@odax58duts1 patchODA2020]# rpm -e glibc-common-2.12-1.212.0.3.el6_10.3.i686 --nodeps

You have new mail in /var/spool/mail/root

[root@odax58duts1 patchODA2020]# oakcli validate -c ospatch -ver 18.8.0.0.0

INFO: Validating the OS patch for the version 18.8.0.0.0

INFO: 2020-06-06 14:09:51: Performing a dry run for OS patching

INFO: 2020-06-06 14:10:09: No conflict detected during the OS update, dry run check.

NODE 2:

[root@odax58duts2 patchODA2020]# oakcli validate -c ospatch -ver 18.8.0.0.0
INFO: Validating the OS patch for the version 18.8.0.0.0
INFO: 2020-06-06 14:15:59: Performing a dry run for OS patching
ERROR: 2020-06-06 14:16:18: Unable to run the command : /usr/bin/yum --exclude=kmod-mpt2sas,ibutils-libs,dapl,libcxgb3,libipathverbs,libmthca,libnes,ofed-docs update --disablerepo=* --enablerepo=ODA_REPOS_LOC -y
ERROR: 2020-06-06 14:16:18: Loaded plugins: rhnplugin, ulninfo, versionlock
This system is not registered with ULN.
You can use uln_register to register.
ULN support will be disabled.
Repository ol6_latest is listed more than once in the configuration
Repository ol6_addons is listed more than once in the configuration
Repository ol6_UEK_latest is listed more than once in the configuration
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package cpupowerutils.x86_64 0:1.3-2.el6 will be updated
---> Package cpupowerutils.x86_64 0:1.3-2.0.1.el6 will be an update
---> Package cups-libs.x86_64 1:1.4.2-79.el6 will be updated
---> Package cups-libs.x86_64 1:1.4.2-81.el6_10 will be an update
---> Package dbus.x86_64 1:1.2.24-9.0.1.el6 will be updated
---> Package dbus.x86_64 1:1.2.24-11.0.1.el6_10 will be an update
---> Package dbus-libs.x86_64 1:1.2.24-9.0.1.el6 will be updated
---> Package dbus-libs.x86_64 1:1.2.24-11.0.1.el6_10 will be an update
---> Package dracut.noarch 0:004-411.0.3.el6 will be updated
---> Package dracut.noarch 0:004-411.0.4.el6 will be an update
---> Package dracut-kernel.noarch 0:004-411.0.3.el6 will be updated
---> Package dracut-kernel.noarch 0:004-411.0.4.el6 will be an update
---> Package glibc.x86_64 0:2.12-1.212.0.2.el6 will be updated
---> Package glibc.x86_64 0:2.12-1.212.0.3.el6_10.3 will be an update
---> Package glibc-devel.i686 0:2.12-1.212.0.2.el6 will be updated
---> Package glibc-devel.x86_64 0:2.12-1.212.0.2.el6 will be updated
---> Package glibc-devel.i686 0:2.12-1.212.0.3.el6_10.3 will be an update
---> Package glibc-devel.x86_64 0:2.12-1.212.0.3.el6_10.3 will be an update
---> Package glibc-headers.x86_64 0:2.12-1.212.0.2.el6 will be updated
---> Package glibc-headers.x86_64 0:2.12-1.212.0.3.el6_10.3 will be an update
---> Package initscripts.x86_64 0:9.03.61-1.0.3.el6 will be updated
---> Package initscripts.x86_64 0:9.03.61-1.0.6.el6 will be an update
---> Package kernel-headers.x86_64 0:2.6.32-754.11.1.el6 will be updated
---> Package kernel-headers.x86_64 0:2.6.32-754.18.2.el6 will be an update
---> Package kernel-uek.x86_64 0:4.1.12-124.33.4.el6uek will be installed
---> Package kernel-uek-firmware.noarch 0:4.1.12-124.33.4.el6uek will be installed
---> Package kexec-tools.x86_64 0:2.0.7-1.0.27.el6 will be updated
---> Package kexec-tools.x86_64 0:2.0.7-1.0.28.el6 will be an update
---> Package ksplice.x86_64 0:1.0.38-1.el6 will be updated
---> Package ksplice.x86_64 0:1.0.43-1.el6 will be an update
---> Package ksplice-core0.x86_64 0:1.0.38-1.el6 will be updated
---> Package ksplice-core0.x86_64 0:1.0.43-1.el6 will be an update
---> Package ksplice-offline.x86_64 0:1.0.38-1.el6 will be updated
---> Package ksplice-offline.x86_64 0:1.0.43-1.el6 will be an update
---> Package ksplice-tools.x86_64 0:1.0.38-1.el6 will be updated
---> Package ksplice-tools.x86_64 0:1.0.43-1.el6 will be an update
---> Package libgudev1.x86_64 0:147-2.73.0.1.el6_8.2 will be updated
---> Package libgudev1.x86_64 0:147-2.73.0.2.el6_8.2 will be an update
---> Package libudev.x86_64 0:147-2.73.0.1.el6_8.2 will be updated
---> Package libudev.x86_64 0:147-2.73.0.2.el6_8.2 will be an update
---> Package mailx.x86_64 0:12.4-8.el6_6 will be updated
---> Package mailx.x86_64 0:12.4-10.el6_10 will be an update
---> Package openssl.x86_64 0:1.0.1e-57.0.6.el6 will be updated
---> Package openssl.x86_64 0:1.0.1e-58.0.1.el6_10 will be an update
---> Package oracle-ofed-release.x86_64 0:1.0.0-50.el6 will be updated
---> Package oracle-ofed-release.x86_64 0:1.0.0-51.el6 will be an update
---> Package perf.x86_64 0:2.6.32-754.11.1.el6 will be updated
---> Package perf.x86_64 0:2.6.32-754.18.2.el6 will be an update
---> Package python.x86_64 0:2.6.6-66.0.1.el6_8 will be updated
---> Package python.x86_64 0:2.6.6-68.0.1.el6_10 will be an update
---> Package python-libs.x86_64 0:2.6.6-66.0.1.el6_8 will be updated
---> Package python-libs.x86_64 0:2.6.6-68.0.1.el6_10 will be an update
---> Package rdma.noarch 2:3.10-3.0.40.el6 will be updated
---> Package rdma.noarch 2:3.10-3.0.41.el6 will be an update
---> Package samba.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package samba-client.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba-client.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package samba-common.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba-common.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package samba-winbind.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba-winbind.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package samba-winbind-clients.x86_64 0:3.6.23-51.0.1.el6 will be updated
---> Package samba-winbind-clients.x86_64 0:3.6.23-52.0.1.el6_10 will be an update
---> Package sudo.x86_64 0:1.8.6p3-29.el6_9 will be updated
---> Package sudo.x86_64 0:1.8.6p3-29.0.1.el6_10.2 will be an update
---> Package tzdata.noarch 0:2019a-1.el6 will be updated
---> Package tzdata.noarch 0:2019c-1.el6 will be an update
---> Package tzdata-java.noarch 0:2018e-3.el6 will be updated
---> Package tzdata-java.noarch 0:2019c-1.el6 will be an update
---> Package udev.x86_64 0:147-2.73.0.1.el6_8.2 will be updated
---> Package udev.x86_64 0:147-2.73.0.2.el6_8.2 will be an update
---> Package vim-common.x86_64 2:7.4.629-5.el6_8.1 will be updated
---> Package vim-common.x86_64 2:7.4.629-5.el6_10.2 will be an update
---> Package vim-enhanced.x86_64 2:7.4.629-5.el6_8.1 will be updated
---> Package vim-enhanced.x86_64 2:7.4.629-5.el6_10.2 will be an update
---> Package vim-minimal.x86_64 2:7.4.629-5.el6_8.1 will be updated
---> Package vim-minimal.x86_64 2:7.4.629-5.el6_10.2 will be an update
---> Package xorg-x11-server-Xorg.x86_64 0:1.17.4-17.0.1.el6 will be updated
---> Package xorg-x11-server-Xorg.x86_64 0:1.17.4-17.0.2.el6 will be an update
---> Package xorg-x11-server-common.x86_64 0:1.17.4-17.0.1.el6 will be updated
---> Package xorg-x11-server-common.x86_64 0:1.17.4-17.0.2.el6 will be an update
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
 Package                 Arch    Version                   Repository      Size
================================================================================
Installing:
 kernel-uek              x86_64  4.1.12-124.33.4.el6uek    ODA_REPOS_LOC   42 M
 kernel-uek-firmware     noarch  4.1.12-124.33.4.el6uek    ODA_REPOS_LOC  2.6 M
Updating:
 cpupowerutils           x86_64  1.3-2.0.1.el6             ODA_REPOS_LOC   77 k
 cups-libs               x86_64  1:1.4.2-81.el6_10         ODA_REPOS_LOC  322 k
 dbus                    x86_64  1:1.2.24-11.0.1.el6_10    ODA_REPOS_LOC  211 k
 dbus-libs               x86_64  1:1.2.24-11.0.1.el6_10    ODA_REPOS_LOC  127 k
 dracut                  noarch  004-411.0.4.el6           ODA_REPOS_LOC  129 k
 dracut-kernel           noarch  004-411.0.4.el6           ODA_REPOS_LOC   29 k
 glibc                   x86_64  2.12-1.212.0.3.el6_10.3   ODA_REPOS_LOC  3.8 M
 glibc-devel             i686    2.12-1.212.0.3.el6_10.3   ODA_REPOS_LOC  992 k
 glibc-devel             x86_64  2.12-1.212.0.3.el6_10.3   ODA_REPOS_LOC  991 k
 glibc-headers           x86_64  2.12-1.212.0.3.el6_10.3   ODA_REPOS_LOC  620 k
 initscripts             x86_64  9.03.61-1.0.6.el6         ODA_REPOS_LOC  952 k
 kernel-headers          x86_64  2.6.32-754.18.2.el6       ODA_REPOS_LOC  4.6 M
 kexec-tools             x86_64  2.0.7-1.0.28.el6          ODA_REPOS_LOC  339 k
 ksplice                 x86_64  1.0.43-1.el6              ODA_REPOS_LOC  9.1 k
 ksplice-core0           x86_64  1.0.43-1.el6              ODA_REPOS_LOC  271 k
 ksplice-offline         x86_64  1.0.43-1.el6              ODA_REPOS_LOC  7.9 k
 ksplice-tools           x86_64  1.0.43-1.el6              ODA_REPOS_LOC   92 k
 libgudev1               x86_64  147-2.73.0.2.el6_8.2      ODA_REPOS_LOC   65 k
 libudev                 x86_64  147-2.73.0.2.el6_8.2      ODA_REPOS_LOC   78 k
 mailx                   x86_64  12.4-10.el6_10            ODA_REPOS_LOC  235 k
 openssl                 x86_64  1.0.1e-58.0.1.el6_10      ODA_REPOS_LOC  1.5 M
 oracle-ofed-release     x86_64  1.0.0-51.el6              ODA_REPOS_LOC   16 k
 perf                    x86_64  2.6.32-754.18.2.el6       ODA_REPOS_LOC  4.8 M
 python                  x86_64  2.6.6-68.0.1.el6_10       ODA_REPOS_LOC   76 k
 python-libs             x86_64  2.6.6-68.0.1.el6_10       ODA_REPOS_LOC  5.3 M
 rdma                    noarch  2:3.10-3.0.41.el6         ODA_REPOS_LOC   76 k
 samba                   x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC  5.1 M
 samba-client            x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC   11 M
 samba-common            x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC   10 M
 samba-winbind           x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC  2.2 M
 samba-winbind-clients   x86_64  3.6.23-52.0.1.el6_10      ODA_REPOS_LOC  2.0 M
 sudo                    x86_64  1.8.6p3-29.0.1.el6_10.2   ODA_REPOS_LOC  712 k
 tzdata                  noarch  2019c-1.el6               ODA_REPOS_LOC  507 k
 tzdata-java             noarch  2019c-1.el6               ODA_REPOS_LOC  188 k
 udev                    x86_64  147-2.73.0.2.el6_8.2      ODA_REPOS_LOC  360 k
 vim-common              x86_64  2:7.4.629-5.el6_10.2      ODA_REPOS_LOC  6.7 M
 vim-enhanced            x86_64  2:7.4.629-5.el6_10.2      ODA_REPOS_LOC  1.0 M
 vim-minimal             x86_64  2:7.4.629-5.el6_10.2      ODA_REPOS_LOC  421 k
 xorg-x11-server-Xorg    x86_64  1.17.4-17.0.2.el6         ODA_REPOS_LOC  1.4 M
 xorg-x11-server-common  x86_64  1.17.4-17.0.2.el6         ODA_REPOS_LOC   51 k
Transaction Summary
================================================================================
Install       2 Package(s)
Upgrade      40 Package(s)
Total download size: 112 M
Downloading Packages:
--------------------------------------------------------------------------------
Total                                           187 MB/s | 112 MB     00:00
Running rpm_check_debug
ERROR with rpm_check_debug vs depsolve:
glibc = 2.12-1.212.0.1.el6 is needed by (installed) nscd-2.12-1.212.0.1.el6.x86_64
** Found 11 pre-existing rpmdb problem(s), 'yum check' output follows:
glibc-2.12-1.212.0.2.el6.x86_64 has missing requires of glibc-common = ('0', '2.12', '1.212.0.2.el6')
glibc-2.12-1.212.0.3.el6_10.3.i686 is a duplicate with glibc-2.12-1.212.0.2.el6.x86_64
libgcc-4.4.7-23.0.1.el6.x86_64 is a duplicate with libgcc-4.4.7-18.el6.i686
nscd-2.12-1.212.0.1.el6.x86_64 has missing requires of glibc = ('0', '2.12', '1.212.0.1.el6')
oak-18.7.0.0.0_LINUX.X64_190915-1.x86_64 has missing requires of libnfsodm18.so()(64bit)
oak-18.7.0.0.0_LINUX.X64_190915-1.x86_64 has missing requires of perl(GridDefParams)
oak-18.7.0.0.0_LINUX.X64_190915-1.x86_64 has missing requires of perl(oakosdiskinfo)
oak-18.7.0.0.0_LINUX.X64_190915-1.x86_64 has missing requires of perl(oaksharedstorageinfo)
oak-18.7.0.0.0_LINUX.X64_190915-1.x86_64 has missing requires of perl(oakstoragetopology)
oak-18.7.0.0.0_LINUX.X64_190915-1.x86_64 has missing requires of perl(ol5_to_ol6_upgrade)
oak-18.7.0.0.0_LINUX.X64_190915-1.x86_64 has missing requires of perl(s_GridSteps)
Your transaction was saved, rerun it with:
 yum load-transaction /tmp/yum_save_tx-2020-06-06-14-16PkHNuJ.yumtx
WARNING: 2020-06-06 14:16:18: OS Upgrade is not successful. Need to resolve conflicts
INFO: 2020-06-06 14:16:18: Copy the required RPMs to a location and set EXTRA_OS_RPMS_LOC to that location

Here we need to solve the dependency problem, in this case we will remove the package

[root@odax58duts2 patchODA2020]# rpm -e nscd-2.12-1.212.0.1.el6 --nodeps

[root@odax58duts2 patchODA2020]# oakcli validate -c ospatch -ver 18.8.0.0.0

INFO: Validating the OS patch for the version 18.8.0.0.0

INFO: 2020-06-06 14:16:38: Performing a dry run for OS patching

INFO: 2020-06-06 14:16:55: No conflict detected during the OS update, dry run check.

Now we have the environment validated for updating the operating system

5 – Verify which components it will be required to update:

oakcli update -patch 18.8.0.0.0 --verify

e.g:

[root@odax58duts1 patchODA2020]# oakcli update -patch 18.8.0.0.0 --verify

INFO: 2020-06-06 14:27:34: Reading the metadata file now...

Component Name Installed Version Proposed Patch Version

--------------- ------------------ -----------------

Controller_INT 11.05.03.00 Up-to-date

Controller_EXT 11.05.03.00 Up-to-date

Expander 001E Up-to-date

SSD_SHARED 944A Up-to-date

HDD_LOCAL A7E0 Up-to-date

HDD_SHARED A7E0 Up-to-date

ILOM 4.0.4.40 r130079 5.0.0.20 r133445

BIOS 25080100 Up-to-date

IPMI 1.8.15.0 Up-to-date

HMP 2.4.5.0.1 Up-to-date

OAK 18.7.0.0.0 18.8.0.0.0

OL 6.10 Up-to-date

OVM 3.4.4 Up-to-date

GI_HOME 18.7.0.0.190716 18.8.0.0.191015

DB_HOME {

[ OraDb12102_home1 ] 12.1.0.2.190716 12.1.0.2.191015

[ OraDb11204_home1 ] 11.2.0.4.190716 11.2.0.4.191015

[ OraDB18Home1 ] 18.7.1.0.191015 18.8.0.0.191015

[ OraDb11203_home2 ] 11.2.0.3.15 No-update

}

[root@odax58duts2 patchODA2020]# oakcli update -patch 18.8.0.0.0 --verify

INFO: 2020-06-06 14:27:39: Reading the metadata file now...

Component Name Installed Version Proposed Patch Version

--------------- ------------------ -----------------

Controller_INT 11.05.03.00 Up-to-date

Controller_EXT 11.05.03.00 Up-to-date

Expander 001E Up-to-date

SSD_SHARED 944A Up-to-date

HDD_LOCAL A7E0 Up-to-date

HDD_SHARED A7E0 Up-to-date

ILOM 4.0.4.40 r130079 5.0.0.20 r133445

BIOS 25080100 Up-to-date

IPMI 1.8.15.0 Up-to-date

HMP 2.4.5.0.1 Up-to-date

OAK 18.7.0.0.0 18.8.0.0.0

OL 6.10 Up-to-date

OVM 3.4.4 Up-to-date

GI_HOME 18.7.0.0.190716 18.8.0.0.191015

DB_HOME {

[ OraDb12102_home1 ] 12.1.0.2.190716 12.1.0.2.191015

[ OraDb11204_home1 ] 11.2.0.4.190716 11.2.0.4.191015

[ OraDB18Home1 ] 18.7.1.0.191015 18.8.0.0.191015

[ OraDb11203_home2 ] 11.2.0.3.15 No-update

Note:

– Stop all databases and clusterware resource before patch ILOM in both servers

6 – Update ILOM

I prefer to first update ILOM in case there is a problem during the patch I can access the environment via ILOM.

If you want to update ILOM with the patch, it may be your choice as well.

Note: Upgrade ILOM: Both ILOM nodes, but not in the same time.

Download ILOM Patch separately: Sun Server X5-8 (ILOM 5.0.0.20 133445)

Custom File Name: p30802633_300_Generic.zip

The firmware are inside of this .zip file in the following path:

C:\Users\andre.ontalba\Downloads\p30802633_300_Generic\Oracle_Server_X5-8-3.0.0.91223-FIRMWARE_PACK\Firmware\service-processor\ILOM-5_0_0_20_r133445-Oracle_Server_X5-4_X5-8.pkg

You must copy this file to the following directory in the server to use for transfer this package. In my case dutsLinux1.

/patch/ILOM

This file must be owned by “oracle” user and group “oinstall“

Connect to the gateway machine: dutsLinux1

ssh xxxx@dutsLinux1

Connect to the first ILOM from the gateway machine:

ssh root@odax58duts1-ilom

Enter remote user password: **********

Check current version:

e.g:

Type:

-> version

SP firmware 4.0.4.40

SP firmware build number: 130079

SP firmware date: Thu May 07 09:54:31 CST 2019

SP filesystem version: 0.2.10

Stopping DOM0 and ODA_BASE: (Connect to the ILOM):

stop /SYS

Check status:

show /SYS

e.g: Section “Properties –> power_state = Off”

  Properties:
      type = Host System
      ipmi_name = /SYS
      product_name = SUN SERVER X5-8
      product_part_number = XXXXXXXXXXXXXXX
      product_serial_number = XXXXXXXXXXXXXXX
      product_manufacturer = Oracle Corporation
      fault_state = OK
      clear_fault_action = (none)
      power_state = Off

Load new ILOM image:

load -source scp://xxxx@dutsLinux1/ILOM/ ILOM-5_0_0_20_r133445-Oracle_Server_X5-4_X5-8.pkg

e.g:

load -source scp://root@dutsLinux1/ILOM/ ILOM-5_0_0_20_r133445-Oracle_Server_X5-4_X5-8.pkg

Enter remote user password: **********

NOTE: An upgrade takes several minutes to complete. ILOM

will enter a special mode to load new firmware. No

other tasks can be performed in ILOM until the

firmware upgrade is complete and ILOM is reset.

You can choose to postpone the server BIOS upgrade until the

next server poweroff. If you do not do that, you should

perform a clean shutdown of the server before continuing.

Are you sure you want to load the specified file (y/n)? y

Preserve existing SP configuration (y/n)? y

Preserve existing BIOS configuration (y/n)? y

Delay BIOS upgrade until next server poweroff or reset (y/n)? n

...

After the automatic reboot performed in the ILOM, you can validate the new firmware and BIOS version:

version

Hostname: odax58duts1-ilom

-> version

SP firmware 5.0.0.20

SP firmware build number: 133445

SP firmware date: Thr Feb 06 09:54:31 CST 2020

SP filesystem version: 0.2.10

show /SYS/MB/BIOS

/SYS/MB/BIOS

  Targets:
  Properties:
      type = BIOS
      ipmi_name = MB/BIOS
      fru_name = SYSTEM BIOS
      fru_manufacturer = AMERICAN MEGATRENDS
      fru_version = 25080100
      fru_part_number = APTIO

Start the DOM0 and ODA_BASE after upgrade ILOM:

start /SYS

exit

Note: Repeat this procedure on the second node

7 – Upgrade ODA Servers

Note: Run only from first server and make sure that is the master node: oakcli show ismaster ⇒ “OAKD is in Master Mode”

oakcli update -patch 18.8.0.0.0 –server

e.g:

[root@odax58duts2 patchODA2020]# oakcli update -patch 18.8.0.0.0 –server

This procedure can take between 2 to 3 hours to execute on both nodes.

INFO: DB, ASM, Clusterware may be stopped during the patch if required

INFO: Both Nodes may get rebooted automatically during the patch if required

Do you want to continue: [Y/N]?: y

INFO: User has confirmed for the reboot

INFO: Patch bundle must be unpacked on the second Node also before applying the patch

Did you unpack the patch bundle on the second Node? : [Y/N]? : y

INFO: All the VMs except the ODABASE will be shutdown forcefully if needed

Do you want to continue : [Y/N]? : y

To be able to monitor the application of the patch we can see the logs in the directories below

Log file directory node 1: /opt/oracle/oak/log/odax58duts1/patch/18.8.0.0.0/

Log file directory node 2: /opt/oracle/oak/log/odax58duts2/patch/18.8.0.0.0/

Let’s check how is the server update

Node 1

[root@odax58duts1 patchODA2020]# oakcli show version -detail

INFO: 2020-06-06 17:17:34: Reading the metadata file now...

Component Name Installed Version Proposed Patch Version

--------------- ------------------ -----------------

Controller_INT 11.05.03.00 Up-to-date

Controller_EXT 11.05.03.00 Up-to-date

Expander 001E Up-to-date

SSD_SHARED 944A Up-to-date

HDD_LOCAL A7E0 Up-to-date

HDD_SHARED A7E0 Up-to-date

ILOM 5.0.0.20 r133445 Up-to-date

BIOS 25080100 Up-to-date

IPMI 1.8.15.0 Up-to-date

HMP 2.4.5.0.1 Up-to-date

OAK 18.8.0.0.0 Up-to-date

OL 6.10 Up-to-date

OVM 3.4.4 Up-to-date

GI_HOME 18.8.0.0.191015 Up-to-date

DB_HOME {

[ OraDb12102_home1 ] 12.1.0.2.190716 12.1.0.2.191015

[ OraDb11204_home1 ] 11.2.0.4.190716 11.2.0.4.191015

[ OraDB18Home1 ] 18.7.1.0.191015 18.8.0.0.191015

[ OraDb11203_home2 ] 11.2.0.3.15 No-update

Node 2

[root@odax58duts2 patchODA2020]# oakcli show version -detail

INFO: 2020-06-06 17:18:39: Reading the metadata file now...

Component Name Installed Version Proposed Patch Version

--------------- ------------------ -----------------

Controller_INT 11.05.03.00 Up-to-date

Controller_EXT 11.05.03.00 Up-to-date

Expander 001E Up-to-date

SSD_SHARED 944A Up-to-date

HDD_LOCAL A7E0 Up-to-date

HDD_SHARED A7E0 Up-to-date

ILOM 5.0.0.20 r133445 Up-to-date

BIOS 25080100 Up-to-date

IPMI 1.8.15.0 Up-to-date

HMP 2.4.5.0.1 Up-to-date

OAK 18.8.0.0.0 Up-to-date

OL 6.10 Up-to-date

OVM 3.4.4 Up-to-date

GI_HOME 18.8.0.0.191015 Up-to-date

DB_HOME {

[ OraDb12102_home1 ] 12.1.0.2.190716 12.1.0.2.191015

[ OraDb11204_home1 ] 11.2.0.4.190716 11.2.0.4.191015

[ OraDB18Home1 ] 18.7.1.0.191015 18.8.0.0.191015

[ OraDb11203_home2 ] 11.2.0.3.15 No-update

8 – ODA Patch: Database Binaries

Now is time to apply patch on the Oracle Database Binaries (11.2.0.4,12.1,12.2,18.).

Get the database list and Oracle Home before patch:

oakcli show databases

First step is stop TFA (Both Nodes with root)

tfactl stop

To apply the patch in the Oracle Binaries: (Execute Only From First Node)

oakcli update -patch 18.8.0.0.0 --database

[root@odax58duts1 18.8.0.0.0]# oakcli update -patch 18.8.0.0.0 --database

INFO: Running pre-install scripts

INFO: Running prepatching on node 0

INFO: Running prepatching on node 1

INFO: Completed pre-install scripts

...

INFO: 2020-06-06 18:51:24: ------------------Patching DB-------------------------

INFO: 2020-06-06 18:51:24: Getting all the possible Database Homes for patching

...

INFO: 2020-06-06 18:52:03: Patching 11.2.0.4 Database Homes on the Node odax58duts1

Found the following 11.2.0.4 homes possible for patching:

HOME_NAME HOME_LOCATION

--------- -------------

OraDb11204_home1 /u01/app/oracle/product/11.2.0.4/dbhome_1

[Please note that few of the above Database Homes may be already up-to-date. They will be automatically ignored]

Would you like to patch all the above homes: Y | N ? : Y

INFO: 2020-06-06 18:52:15: Updating OPATCH

Verifying Opatch version for home:</u01/app/oracle/product/11.2.0.4/dbhome_1>.

Expecting version:<11.2.0.3.22>

Opatch version on node <odax58duts1> is <11.2.0.3.22>

Opatch version on node <odax58duts2> is <11.2.0.3.22>

INFO: 2020-06-06 18:53:41: Performing the conflict checks...

SUCCESS: 2020-06-06 18:53:53: Conflict checks passed for all the Homes

INFO: 2020-06-06 18:53:53: Checking if the patch is already applied on any of the Homes

INFO: 2020-06-06 18:53:58: Home is not Up-to-date

SUCCESS: 2020-06-06 18:53:59: Successfully stopped the Database consoles

SUCCESS: 2020-06-06 18:54:06: Successfully stopped the EM agents

INFO: 2020-06-06 18:54:11: Applying the patch on oracle home : /u01/app/oracle/product/11.2.0.4/dbhome_1 ...

SUCCESS: 2020-06-06 18:56:20: Successfully applied the patch on the Home : /u01/app/oracle/product/11.2.0.4/dbhome_1

SUCCESS: 2020-06-06 18:56:20: Successfully started the Database consoles

SUCCESS: 2020-06-06 18:56:20: Successfully started the EM Agents

INFO: 2020-06-06 18:56:23: Patching 11.2.0.4 Database Homes on the Node odax58duts2

...

INFO: 2020-06-06 19:00:02: Patching 12.1.0.2 Database Homes on the Node odax58duts1

Found the following 12.1.0.2 homes possible for patching:

HOME_NAME HOME_LOCATION

--------- -------------

OraDb12102_home1 /u01/app/oracle/product/12.1.0.2/dbhome_1

[Please note that few of the above Database Homes may be already up-to-date. They will be automatically ignored]

Would you like to patch all the above homes: Y | N ? : Y

INFO: 2020-06-06 19:06:38: Updating OPATCH

Verifying Opatch version for home:</u01/app/oracle/product/12.1.0.2/dbhome_1>.

Expecting version:<12.2.0.1.18>

Opatch version on node <odax58duts1> is <12.2.0.1.18>

Opatch version on node <odax58duts2> is <12.2.0.1.18>

INFO: 2020-06-06 19:07:37: Rolling back patches on 12.1.0.2.x home if required...

INFO: 2020-06-06 19:07:43: Checking if any patches need to be rolled back on </u01/app/oracle/product/12.1.0.2/dbhome_1>

INFO: 2020-06-06 19:11:35: Performing the conflict checks...

SUCCESS: 2020-06-06 19:11:59: Conflict checks passed for all the Homes

INFO: 2020-06-06 19:11:59: Checking if the patch is already applied on any of the Homes

INFO: 2020-06-06 19:12:11: Home is not Up-to-date

SUCCESS: 2020-06-06 19:12:13: Successfully stopped the Database consoles

SUCCESS: 2020-06-06 19:12:19: Successfully stopped the EM agents

INFO: 2020-06-06 19:12:25: Applying patch on /u01/app/oracle/product/12.1.0.2/dbhome_1 Homes

INFO: 2020-06-06 19:12:25: It may take upto 15 mins. Please wait...

SUCCESS: 2020-06-06 19:17:19: Successfully applied the patch on the Home : /u01/app/oracle/product/12.1.0.2/dbhome_1

SUCCESS: 2020-06-06 19:17:19: Successfully started the Database consoles

SUCCESS: 2020-06-06 19:17:19: Successfully started the EM Agents

INFO: 2020-06-06 19:17:23: Patching 12.1.0.2 Database Homes on the Node odax58duts2

...

INFO: 2020-06-06 19:27:08: Patching 18.0.0.0 Database Homes on the Node odax58duts1

Found the following 18.0.0.0 homes possible for patching:

HOME_NAME HOME_LOCATION

--------- -------------

OraDB18Home1 /u01/app/oracle/product/18.0.0.0

[Please note that few of the above Database Homes may be already up-to-date. They will be automatically ignored]

Would you like to patch all the above homes: Y | N ? : Y

INFO: 2020-06-06 19:27:15: Updating OPATCH

Verifying Opatch version for home:</u01/app/oracle/product/18.0.0.0>.

Expecting version:<12.2.0.1.18>

Opatch version on node <odax58duts1> is <12.2.0.1.18>

Opatch version on node <odax58duts2> is <12.2.0.1.18>

INFO: 2020-06-06 19:27:26: Rolling back patches on 18.x home if required...

INFO: 2020-06-06 19:27:33: Checking if any patches need to be rolled back on </u01/app/oracle/product/18.0.0.0>

INFO: 2020-06-06 19:28:57: Performing the conflict checks...

SUCCESS: 2020-06-06 19:29:57: Conflict checks passed for all the Homes

INFO: 2020-06-06 19:29:57: Checking if the patch is already applied on any of the Homes

INFO: 2020-06-06 19:30:36: Home is not Up-to-date

SUCCESS: 2020-06-06 19:30:38: Successfully stopped the Database consoles

SUCCESS: 2020-06-06 19:30:44: Successfully stopped the EM agents

INFO: 2020-06-06 19:30:49: Applying patch on /u01/app/oracle/product/18.0.0.0 Homes

INFO: 2020-06-06 19:30:49: It may take up to 15 mins. Please wait...

SUCCESS: 2020-06-06 19:40:34: Successfully applied the patch on the Home : /u01/app/oracle/product/18.0.0.0

SUCCESS: 2020-06-06 19:40:34: Successfully started the Database consoles

SUCCESS: 2020-06-06 19:40:34: Successfully started the EM Agents

INFO: 2020-06-06 19:40:37: Patching 18.0.0.0 Database Homes on the Node odax58duts2

INFO: DB patching summary on node: odax58duts1

SUCCESS: 2020-06-06 19:52:28: Successfully applied the patch on the Home /u01/app/oracle/product/11.2.0.4/dbhome_1

SUCCESS: 2020-06-06 19:52:28: Successfully applied the patch on the Home /u01/app/oracle/product/12.1.0.2/dbhome_1

SUCCESS: 2020-06-06 19:52:28: Successfully applied the patch on the Home /u01/app/oracle/product/18.0.0.0

INFO: DB patching summary on node: odax58duts2

SUCCESS: 2020-06-06 19:52:28: Successfully applied the patch on the Home /u01/app/oracle/product/11.2.0.4/dbhome_1

SUCCESS: 2020-06-06 19:52:28: Successfully applied the patch on the Home /u01/app/oracle/product/12.1.0.2/dbhome_1

SUCCESS: 2020-06-06 19:52:28: Successfully applied the patch on the Home /u01/app/oracle/product/18.0.0.0

INFO: Executing /tmp/pending_actions on both nodes

You have new mail in /var/spool/mail/root

[root@odax58duts1 18.8.0.0.0]#

Let’s check how is the database update

Node 1

[root@odax58duts1 patchODA2020]# oakcli show version -detail

INFO: 2020-06-06 19:55:34: Reading the metadata file now...

Component Name Installed Version Proposed Patch Version

--------------- ------------------ -----------------

Controller_INT 11.05.03.00 Up-to-date

Controller_EXT 11.05.03.00 Up-to-date

Expander 001E Up-to-date

SSD_SHARED 944A Up-to-date

HDD_LOCAL A7E0 Up-to-date

HDD_SHARED A7E0 Up-to-date

ILOM 5.0.0.20 r133445 Up-to-date

BIOS 25080100 Up-to-date

IPMI 1.8.15.0 Up-to-date

HMP 2.4.5.0.1 Up-to-date

OAK 18.8.0.0.0 Up-to-date

OL 6.10 Up-to-date

OVM 3.4.4 Up-to-date

GI_HOME 18.8.0.0.191015 Up-to-date

DB_HOME {

[ OraDb12102_home1 ] 12.1.0.2.191015 Up-to-date

[ OraDb11204_home1 ] 11.2.0.4.191015 Up-to-date

[ OraDB18Home1 ] 18.8.0.0.191015 Up-to-date

[ OraDb11203_home2 ] 11.2.0.3.15 No-update

}

Node 2

[root@odax58duts2 patchODA2020]# oakcli show version -detail

INFO: 2020-06-06 19:56:39: Reading the metadata file now...

Component Name Installed Version Proposed Patch Version

--------------- ------------------ -----------------

Controller_INT 11.05.03.00 Up-to-date

Controller_EXT 11.05.03.00 Up-to-date

Expander 001E Up-to-date

SSD_SHARED 944A Up-to-date

HDD_LOCAL A7E0 Up-to-date

HDD_SHARED A7E0 Up-to-date

ILOM 5.0.0.20 r133445 Up-to-date

BIOS 25080100 Up-to-date

IPMI 1.8.15.0 Up-to-date

HMP 2.4.5.0.1 Up-to-date

OAK 18.8.0.0.0 Up-to-date

OL 6.10 Up-to-date

OVM 3.4.4 Up-to-date

GI_HOME 18.8.0.0.191015 Up-to-date

DB_HOME {

[ OraDb12102_home1 ] 12.1.0.2.191015 Up-to-date

[ OraDb11204_home1 ] 11.2.0.4.191015 Up-to-date

[ OraDB18Home1 ] 18.8.0.0.191015 Up-to-date

[ OraDb11203_home2 ] 11.2.0.3.15 No-update

9- Apply DATAPATCH/CATBUNDLE in databases 11.2 / 12.1 / 12.2 and 18.8

export NLS_LANG=AMERICAN_AMERICA.AL32UTF8

$ORACLE_HOME/OPatch/datapatch -verbose

Note: It is required set up the NLS_LANG to “AMERICAN_AMERICA.AL32UTF8” variable in order to avoid a BUG during the DATAPATCH in the Oracle Database 12.1

Reference Documents:

Oracle Database Appliance – 18.2, 12.X, and 2.X Supported ODA Versions & Known Issues (Doc ID 888888.1)

https://docs.oracle.com/en/engineered-systems/oracle-database-appliance/18.8/cmtxn/patching-oda.html#GUID-49F5F510-3A38-4E6A-B915-FCBCD36CDDDB

I hope I helped with this ODA upgrade procedure

Andre Ontalba

Disclaimer: “The postings on this site are my own and don’t necessarily represent may actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications was removed to allow reach generic audience and to be useful.”

Exadata and ZDLRA, Disable HAIP

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

Exadata and ZDLRA, Disable HAIP

HAIP (High Availability IP) is not supported for the Exadata environment but can occur (if you did not create the cluster using OEDA) that HAIP became in use. And this particularity true for ZDLRA. So, during the upgrade from the previous version (12.2) to a higher version, it is needed to remove HAIP.

Usually, when we upgrading from 12.2 to 18c the HAIP is removed from Exadata. If the upgrade is from 12.1, and HAIP is there, it continues and is not removed by the upgrade process. If you are using HAIP and your GI is 12.1, this procedure as-is described here can’t be used (need some adaptation), because of some requirements from ASM+ACFS+DB. But since this is a preliminary step from a GI upgrade, the focus is to disable and remove it from GI.

The HAIP is not needed for Exadata because by architecture the InfiniBand network already defines (per server) two IP’s to avoid the single point of failure. So, it is not needed to create an additional layer (HAIP and virtual IP), that does the same that already exists by network design.

*Image removed from Oracle Presentation: Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour Oct 2019

This procedure can be used for a non-Exadata environment too. But before continue it is important to be aware of some details with ACFS. If you use ACFS, be careful since the HAIP IP is used in the ASM proxy and needs to be updated; and if ACFS is in place, this procedure is valid just for GI 12.2+.

Environment

In this scenario, I am removing HAIP from ZDLRA GI. The HAIP has configured automatically and we are using ACFS.

The focus is to disable HAIP and remove it from CRS. Another detail is that we need to remove the dependencies between the CRS resources (like ASM that need HAIP) too.

One important detail is that since we are removing the HAIP, we need to swap the HAIP IP that it is used by services (like ASM Proxy) for something that exists. So, you need to have high availability IP’s to use. For Exadata, we use the interconnect IP’s, but if you are using in another environment, be careful that your network has the needed requirements (throughput, failover, and others).

GI Upgrade

During the GI upgrade from 12.2 to 19c we need to run the runcluvfy and it will detect if HAIP is enabled:

....

....

Checks did not pass for the following nodes:

        zeroserv02,zeroserv01







Failures were encountered during execution of CVU verification request "stage -pre crsinst".




Verifying Node Connectivity ...FAILED

zeroserv02: PRVG-11068 : Highly Available IP (HAIP) is enabled on the nodes

            "zeroserv01,zeroserv02".




zeroserv01: PRVG-11068 : Highly Available IP (HAIP) is enabled on the nodes

            "zeroserv01,zeroserv02".




Verifying RPM Package Manager database ...INFORMATION

PRVG-11250 : The check "RPM Package Manager database" was not performed because

it needs 'root' user privileges.







CVU operation performed:      stage -pre crsinst

Date:                         Dec 9, 2019 1:38:46 PM

CVU home:                     /u01/app/19.0.0.0/grid/

User:                         oracle

[root@zeroserv01 ~]#

Remove HAIP

The steps need to be executed in order to avoid errors during the procedure. Unfortunately, it is needed to have one maintenance window due to the unavailability of the services. The running databases need to be shutdown due to the CRS restart by an example.

Bellow, the steps are summarized and explained to be followed. Please be aware of the IP’s involved (they will be different in your case).

1 – Shutdown services

So, this first point is to shutdown all databases running in this cluster:

[oracle@zeroserv01 ~]$ srvctl stop database -d zdlras -o immediate

[oracle@zeroserv01 ~]$

2- Check HAIP, ASM Proxy, and ACFS

It is needed to check if HAIP is enabled. Another detail is check for ASM proxy, and ACFS too:

[oracle@zeroserv01 ~]$ $ORACLE_HOME/bin/crsctl stat res ora.cluster_interconnect.haip -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       zeroserv01               STABLE

--------------------------------------------------------------------------------

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ $ORACLE_HOME/bin/crsctl stat res ora.proxy_advm -t

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.proxy_advm

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

--------------------------------------------------------------------------------

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ $ORACLE_HOME/bin/crsctl stat res -w "TYPE = ora.acfs.type" -t

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.catalog.raadmin.acfs

               ONLINE  ONLINE       zeroserv01               mounted on /raacfs/r

                                                             aadmin,STABLE

               ONLINE  ONLINE       zeroserv02               mounted on /raacfs/r

                                                             aadmin,STABLE

ora.catalog.raosbadmin.acfs

               ONLINE  ONLINE       zeroserv01               mounted on /osbcat,S

                                                             TABLE

               ONLINE  ONLINE       zeroserv02               mounted on /osbcat,S

                                                             TABLE

--------------------------------------------------------------------------------

[oracle@zeroserv01 ~]$

3 – Check New IP

The next step is to check the new IP that will be used. Since ASM does not use HAIP, we can (at Exadata) pickup the same used IP:

[oracle@zeroserv01 ~]$ export ORACLE_SID=+ASM1

[oracle@zeroserv01 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv01 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Mon Dec 9 15:00:55 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string      192.168.10.1:192.168.10.2

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ $ORACLE_HOME/bin/oifcfg getif

bondeth0  10.208.68.0  global  public

ib0  192.168.8.0  global  cluster_interconnect,asm

ib1  192.168.8.0  global  cluster_interconnect,asm

[oracle@zeroserv01 ~]$

In this case, the IP that will substitute HAIP IP is 192.168.10.1/192.168.10.2. And, as you can see above, they are from the interconnect network.

4 – Fixing IP for ASM Proxy

The next step is to fix the IP that is used by the ASM Proxy instance. But default, it pickup the HAIP IP during the startup. The idea here is to force the same IP than the ASM instance for cluster_interconnect parameter.

Connecting at APX instance

Connect at APX instance, check the parameters (and see that are without values). After that we create one init file from memory to have the backup:

[oracle@zeroserv01 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv01 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv01 ~]$ export ORACLE_SID=+APX1

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 10:20:59 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL>  show parameter instance_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_name                        string      +APX1

SQL>  show parameter instance_type




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_type                        string      ASMPROXY

SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string

SQL> create pfile = '/tmp/pfileapx1' from memory;




File created.




SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv01 ~]$

Changing the IP

For ASM Proxy instance we need to set the IP for cluster_interconnect parameter with the same IP used by ASM instance.

To do that, we need to set it in init for ASM Proxy. But since the ASM Proxy instance doesn’t have one, we create one at $GI_HOME/dbs with the connect name (look below the folder and instance name). Below I set just the cluster to interconnect parameter and dedicated to the instance running in this node.

[oracle@zeroserv01 ~]$ echo "+APX1.cluster_interconnects='192.168.10.1:192.168.10.2'" > $ORACLE_HOME/dbs/init+APX1.ora

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ cat $ORACLE_HOME/dbs/init+APX1.ora

+APX1.cluster_interconnects='192.168.10.1:192.168.10.2'

[oracle@zeroserv01 ~]$

Restart the ASM Proxy

After creating the init, we restart the ASM Proxy in this node.

[root@zeroserv01 ~]# export ORACLE_HOME=/u01/app/12.2.0.1/grid

[root@zeroserv01 ~]# export PATH=$ORACLE_HOME/bin:$PATH

[root@zeroserv01 ~]#

[root@zeroserv01 ~]# $ORACLE_HOME/bin/srvctl stop asm -proxy -node zeroserv01 -force

[root@zeroserv01 ~]#

[root@zeroserv01 ~]# $ORACLE_HOME/bin/srvctl start asm -proxy -node zeroserv01

[root@zeroserv01 ~]#

[root@zeroserv01 ~]# ps -ef |grep APX

oracle   267811      1  0 10:43 ?        00:00:00 apx_pmon_+APX1

oracle   267813      1  0 10:43 ?        00:00:00 apx_clmn_+APX1

oracle   267815      1  0 10:43 ?        00:00:00 apx_psp0_+APX1

oracle   267820      1  1 10:43 ?        00:00:00 apx_vktm_+APX1

oracle   267824      1  0 10:43 ?        00:00:00 apx_gen0_+APX1

oracle   267826      1  0 10:43 ?        00:00:00 apx_mman_+APX1

oracle   267830      1  0 10:43 ?        00:00:00 apx_gen1_+APX1

oracle   267834      1  0 10:43 ?        00:00:00 apx_diag_+APX1

oracle   267836      1  0 10:43 ?        00:00:00 apx_dskm_+APX1

oracle   267838      1  0 10:43 ?        00:00:00 apx_pman_+APX1

oracle   267840      1  0 10:43 ?        00:00:00 apx_dia0_+APX1

oracle   267842      1  0 10:43 ?        00:00:00 apx_lreg_+APX1

oracle   267845      1  0 10:43 ?        00:00:00 apx_pxmn_+APX1

oracle   267847      1  0 10:43 ?        00:00:00 apx_rbal_+APX1

oracle   267849      1  0 10:43 ?        00:00:00 apx_vbg0_+APX1

oracle   267851      1  0 10:43 ?        00:00:00 apx_vdbg_+APX1

oracle   267853      1  0 10:43 ?        00:00:00 apx_vubg_+APX1

root     267979  32720  0 10:43 pts/0    00:00:00 grep --color=auto APX

[root@zeroserv01 ~]#

Look that instance restarted correctly and it is up.

And we can check the parameter if it is OK with the IP:

[oracle@zeroserv01 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 10:51:59 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string      192.168.10.1:192.168.10.2

SQL> show parameter instance_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_name                        string      +APX1

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv01 ~]$

If the instance not startup (or the IP’s are wrong) you need to check the init file that you created in $GIHOME/dbs folder.

Restart ACFS

Since we restarted the ASM Proxy instance, the ACFS mountpoints went down and need to be up again:

[root@zeroserv01 ~]# $ORACLE_HOME/bin/crsctl start res -w "TYPE = ora.acfs.type" -n zeroserv01

CRS-2672: Attempting to start 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv01'

CRS-2672: Attempting to start 'ora.CATALOG.RAADMIN.advm' on 'zeroserv01'

CRS-2676: Start of 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv01' succeeded

CRS-2672: Attempting to start 'ora.catalog.raosbadmin.acfs' on 'zeroserv01'

CRS-2676: Start of 'ora.CATALOG.RAADMIN.advm' on 'zeroserv01' succeeded

CRS-2672: Attempting to start 'ora.catalog.raadmin.acfs' on 'zeroserv01'

CRS-2676: Start of 'ora.catalog.raosbadmin.acfs' on 'zeroserv01' succeeded

CRS-2676: Start of 'ora.catalog.raadmin.acfs' on 'zeroserv01' succeeded

[root@zeroserv01 ~]#

I started up using the CRS resource for that.

Other Nodes

After it is Ok for node 1, we can do it for other nodes. Here, since I made for ZDLRA, I have just the second node to fix ASM Proxy. Be careful with the IP’s used, they are specific for this node and remember to use the correct init file.

[root@zeroserv02 ~]# su - oracle

Last login: Tue Dec 10 10:53:02 CET 2019

[oracle@zeroserv02 ~]$ export ORACLE_SID=+ASM2

[oracle@zeroserv02 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv02 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 10:56:24 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string      192.168.10.3:192.168.10.4

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv02 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv02 ~]$ export ORACLE_SID=+APX1

[oracle@zeroserv02 ~]$ export ORACLE_SID=+APX2

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 10:57:15 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL>

SQL> show parameter instance_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_name                        string      +APX2

SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string

SQL> create pfile = '/tmp/pfileapx2' from memory;




File created.




SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ ls -l $ORACLE_HOME/dbs/init?APX*

ls: cannot access /u01/app/12.2.0.1/grid/dbs/init?APX*: No such file or directory

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ echo "+APX2.cluster_interconnects='192.168.10.3:192.168.10.4'" > $ORACLE_HOME/dbs/init+APX2.ora

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ cat $ORACLE_HOME/dbs/init+APX2.ora

+APX2.cluster_interconnects='192.168.10.3:192.168.10.4'

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ exit   

[root@zeroserv02 ~]# export ORACLE_HOME=/u01/app/12.2.0.1/grid

[root@zeroserv02 ~]# export PATH=$ORACLE_HOME/bin:$PATH

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# $ORACLE_HOME/bin/srvctl stop asm -proxy -node zeroserv02 -force

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# $ORACLE_HOME/bin/srvctl start asm -proxy -node zeroserv02

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# $ORACLE_HOME/bin/crsctl start res -w "TYPE = ora.acfs.type" -n zeroserv02

CRS-2672: Attempting to start 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.CATALOG.RAADMIN.advm' on 'zeroserv02'

CRS-2676: Start of 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.catalog.raosbadmin.acfs' on 'zeroserv02'

CRS-2676: Start of 'ora.CATALOG.RAADMIN.advm' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.catalog.raadmin.acfs' on 'zeroserv02'

CRS-2676: Start of 'ora.catalog.raosbadmin.acfs' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.catalog.raadmin.acfs' on 'zeroserv02' succeeded

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# su - oracle

Last login: Tue Dec 10 10:54:10 CET 2019

[oracle@zeroserv02 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv02 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv02 ~]$ export ORACLE_SID=+APX1

[oracle@zeroserv02 ~]$ export ORACLE_SID=+APX2




[oracle@zeroserv02 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 11:00:46 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string      192.168.10.3:192.168.10.4

SQL> show parameter instance_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_name                        string      +APX2

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

You have new mail in /var/spool/mail/oracle

[oracle@zeroserv02 ~]$

5 – ASM Dependency

Since the ASM since the CRS depends on HAIP we need to remove this dependency. Here we have a tricky part. It is needed to completely change the dependencies for ASM, setting it to CRS/CSS directly.

First, at first node we check the current dependency for START and STOP:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res ora.asm -p |grep START_DEPENDENCIES=

START_DEPENDENCIES=hard(ora.ASMNET1LSNR_ASM.lsnr,ora.ASMNET2LSNR_ASM.lsnr) weak(ora.LISTENER.lsnr) pullup(ora.ASMNET1LSNR_ASM.lsnr,ora.ASMNET2LSNR_ASM.lsnr) dispersion:active(site:type:ora.asm.type)

[root@zeroserv01 ~]#

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res ora.asm -p |grep STOP_DEPENDENCIES=

STOP_DEPENDENCIES=hard(intermediate:ora.ASMNET1LSNR_ASM.lsnr,intermediate:ora.ASMNET2LSNR_ASM.lsnr)

[root@zeroserv01 ~]#

And now we change it. Look at the parameters values. They are completely different, and need to be with these specific values:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.asm -attr "START_DEPENDENCIES='hard(ora.cssd,ora.ctssd) pullup(ora.cssd,ora.ctssd) weak(ora.drivers.acfs)',STOP_DEPENDENCIES='hard(intermediate:ora.cssd)'" -init

[root@zeroserv01 ~]#

If you see closely the HAIP it is not listed there as a dependency, setting with the values, we completely remove HAIP dependency and inheritance.

And we need to do the same for the other node:

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.asm -attr "START_DEPENDENCIES='hard(ora.cssd,ora.ctssd) pullup(ora.cssd,ora.ctssd) weak(ora.drivers.acfs)',STOP_DEPENDENCIES='hard(intermediate:ora.cssd)'" -init

[root@zeroserv02 ~]#

6 – Disable HAIP Resource

The next step is to disable the HAIP resource from startup at CRS. We do this in both nodes.

Node 1:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init

[root@zeroserv01 ~]#

Node 2:

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init

[root@zeroserv02 ~]#

But check that we do not stop yet the HAIP resource. It is needed to remain up at this moment.

If we check with the init for CRS, it is still there in node 1:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.crf

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.crsd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cssd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cssdmonitor

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.ctssd

      1        ONLINE  ONLINE       zeroserv01               OBSERVER,STABLE

ora.diskmon

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.drivers.acfs

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.drivers.oka

      1        OFFLINE OFFLINE                               STABLE

ora.evmd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.gipcd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.gpnpd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.mdnsd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.storage

      1        ONLINE  ONLINE       zeroserv01               STABLE

--------------------------------------------------------------------------------

[root@zeroserv01 ~]#

And for the other node too:

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.crf

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.crsd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cssd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cssdmonitor

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.ctssd

      1        ONLINE  ONLINE       zeroserv02               ACTIVE:0,STABLE

ora.diskmon

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.drivers.acfs

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.drivers.oka

      1        OFFLINE OFFLINE                               STABLE

ora.evmd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.gipcd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.gpnpd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.mdnsd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.storage

      1        ONLINE  ONLINE       zeroserv02               STABLE

--------------------------------------------------------------------------------

[root@zeroserv02 ~]#

7 – Restart CRS

After we configure the dependencies for ASM we can restart CRS to shutfown HAIP (and test the changes that we made). I prefer to execute, first, in just one node and if everything goes well, I do for the others.

Stop Cluster

So, first the STOP CLUSTER in the first node:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stop cluster -f

CRS-2673: Attempting to stop 'ora.crsd' on 'zeroserv01'

CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'zeroserv01'

CRS-2673: Attempting to stop 'ora.catalog.raadmin.acfs' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.catalog.raosbadmin.acfs' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.DELTA.dg' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.CATALOG.dg' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.cvu' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.chad' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.chad' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.LISTENER_SCAN3.lsnr' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.qosmserver' on 'zeroserv01'

CRS-2677: Stop of 'ora.DELTA.dg' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.CATALOG.dg' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.asm' on 'zeroserv01'

CRS-2677: Stop of 'ora.cvu' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.asm' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.ASMNET2LSNR_ASM.lsnr' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'zeroserv01'

CRS-2677: Stop of 'ora.LISTENER_SCAN2.lsnr' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.scan2.vip' on 'zeroserv01'

CRS-2677: Stop of 'ora.LISTENER_SCAN3.lsnr' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.scan3.vip' on 'zeroserv01'

CRS-2677: Stop of 'ora.catalog.raosbadmin.acfs' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv01'

CRS-2677: Stop of 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.catalog.raadmin.acfs' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.RAADMIN.advm' on 'zeroserv01'

CRS-2677: Stop of 'ora.CATALOG.RAADMIN.advm' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.proxy_advm' on 'zeroserv01'

CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.zeroserv01.vip' on 'zeroserv01'

CRS-2677: Stop of 'ora.scan3.vip' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.qosmserver' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.scan2.vip' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.zeroserv01.vip' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.ASMNET2LSNR_ASM.lsnr' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.proxy_advm' on 'zeroserv01' succeeded

CRS-2675: Stop of 'ora.chad' on 'zeroserv02' failed

CRS-2679: Attempting to clean 'ora.chad' on 'zeroserv02'

CRS-2675: Stop of 'ora.chad' on 'zeroserv01' failed

CRS-2679: Attempting to clean 'ora.chad' on 'zeroserv01'

CRS-2681: Clean of 'ora.chad' on 'zeroserv02' succeeded

CRS-2681: Clean of 'ora.chad' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.mgmtdb' on 'zeroserv01'

CRS-2677: Stop of 'ora.mgmtdb' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.MGMTLSNR' on 'zeroserv01'

CRS-2677: Stop of 'ora.MGMTLSNR' on 'zeroserv01' succeeded

CRS-2672: Attempting to start 'ora.MGMTLSNR' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.qosmserver' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.scan2.vip' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.scan3.vip' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.cvu' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.zeroserv01.vip' on 'zeroserv02'

CRS-2676: Start of 'ora.cvu' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.scan2.vip' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.zeroserv01.vip' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.LISTENER_SCAN2.lsnr' on 'zeroserv02'

CRS-2676: Start of 'ora.scan3.vip' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.LISTENER_SCAN3.lsnr' on 'zeroserv02'

CRS-2676: Start of 'ora.MGMTLSNR' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.mgmtdb' on 'zeroserv02'

CRS-2676: Start of 'ora.qosmserver' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.LISTENER_SCAN3.lsnr' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.mgmtdb' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.chad' on 'zeroserv02'

CRS-2676: Start of 'ora.chad' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.ons' on 'zeroserv01'

CRS-2677: Stop of 'ora.ons' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.net1.network' on 'zeroserv01'

CRS-2677: Stop of 'ora.net1.network' on 'zeroserv01' succeeded

CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'zeroserv01' has completed

CRS-2677: Stop of 'ora.crsd' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.ctssd' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.evmd' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.storage' on 'zeroserv01'

CRS-2677: Stop of 'ora.storage' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.asm' on 'zeroserv01'

CRS-2677: Stop of 'ora.ctssd' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.evmd' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.asm' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'zeroserv01'

CRS-2677: Stop of 'ora.cssd' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.diskmon' on 'zeroserv01'

CRS-2677: Stop of 'ora.diskmon' on 'zeroserv01' succeeded

[root@zeroserv01 ~]#

Stop/Start CRS

And if everything was successful, we can stop CRS. This is needed because HAIP is from init from CRS:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.gpnpd' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.crf' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'zeroserv01'

CRS-2677: Stop of 'ora.drivers.acfs' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.gpnpd' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.crf' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'zeroserv01'

CRS-2677: Stop of 'ora.mdnsd' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.gipcd' on 'zeroserv01' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'zeroserv01' has completed

CRS-4133: Oracle High Availability Services has been stopped.

[root@zeroserv01 ~]#

After that we can start again CRS in this node:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@zeroserv01 ~]#

Check CRS Init

And after some time we can see if the HAIP not restarted (as expected):

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       zeroserv01               Started,STABLE

ora.cluster_interconnect.haip

      1        OFFLINE OFFLINE                               STABLE

ora.crf

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.crsd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cssd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cssdmonitor

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.ctssd

      1        ONLINE  ONLINE       zeroserv01               ACTIVE:0,STABLE

ora.diskmon

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.drivers.acfs

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.drivers.oka

      1        OFFLINE OFFLINE                               STABLE

ora.evmd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.gipcd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.gpnpd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.mdnsd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.storage

      1        ONLINE  ONLINE       zeroserv01               STABLE

--------------------------------------------------------------------------------

[root@zeroserv01 ~]#

As you can see above, the HAIP not started during the init of CRS. If the HAIP still up, please check the topic 5.

Other nodes

After that, we can do the same in the other nodes.

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stop cluster -f

CRS-2673: Attempting to stop 'ora.crsd' on 'zeroserv02'

CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'zeroserv02'

CRS-2673: Attempting to stop 'ora.DELTA.dg' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.catalog.raadmin.acfs' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.chad' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.catalog.raosbadmin.acfs' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'zeroserv02'

CRS-2677: Stop of 'ora.DELTA.dg' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.dg' on 'zeroserv02'

CRS-2677: Stop of 'ora.CATALOG.dg' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.asm' on 'zeroserv02'

CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.zeroserv02.vip' on 'zeroserv02'

CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.scan1.vip' on 'zeroserv02'

CRS-2677: Stop of 'ora.catalog.raadmin.acfs' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.RAADMIN.advm' on 'zeroserv02'

CRS-2677: Stop of 'ora.CATALOG.RAADMIN.advm' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.catalog.raosbadmin.acfs' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv02'

CRS-2677: Stop of 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.proxy_advm' on 'zeroserv02'

CRS-2677: Stop of 'ora.asm' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.ASMNET2LSNR_ASM.lsnr' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'zeroserv02'

CRS-2677: Stop of 'ora.zeroserv02.vip' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.scan1.vip' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.ASMNET2LSNR_ASM.lsnr' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.proxy_advm' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.chad' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.scan1.vip' on 'zeroserv01'

CRS-2672: Attempting to start 'ora.zeroserv02.vip' on 'zeroserv01'

CRS-2676: Start of 'ora.zeroserv02.vip' on 'zeroserv01' succeeded

CRS-2676: Start of 'ora.scan1.vip' on 'zeroserv01' succeeded

CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'zeroserv01'

CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.ons' on 'zeroserv02'

CRS-2677: Stop of 'ora.ons' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.net1.network' on 'zeroserv02'

CRS-2677: Stop of 'ora.net1.network' on 'zeroserv02' succeeded

CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'zeroserv02' has completed

CRS-2677: Stop of 'ora.crsd' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.ctssd' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.evmd' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.storage' on 'zeroserv02'

CRS-2677: Stop of 'ora.storage' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.asm' on 'zeroserv02'

CRS-2677: Stop of 'ora.ctssd' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.evmd' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.asm' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'zeroserv02'

CRS-2677: Stop of 'ora.cssd' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.diskmon' on 'zeroserv02'

CRS-2677: Stop of 'ora.diskmon' on 'zeroserv02' succeeded

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.crf' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.gpnpd' on 'zeroserv02'

CRS-2677: Stop of 'ora.drivers.acfs' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.crf' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'zeroserv02'

CRS-2677: Stop of 'ora.mdnsd' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.gpnpd' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.gipcd' on 'zeroserv02' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'zeroserv02' has completed

CRS-4133: Oracle High Availability Services has been stopped.

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cluster_interconnect.haip

      1        OFFLINE OFFLINE                               STABLE

ora.crf

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.crsd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cssd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cssdmonitor

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.ctssd

      1        ONLINE  ONLINE       zeroserv02               ACTIVE:0,STABLE

ora.diskmon

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.drivers.acfs

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.drivers.oka

      1        OFFLINE OFFLINE                               STABLE

ora.evmd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.gipcd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.gpnpd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.mdnsd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.storage

      1        ONLINE  ONLINE       zeroserv02               STABLE

--------------------------------------------------------------------------------

[root@zeroserv02 ~]#

8 – Check if everything is UP

After doing the changes in both nodes we can see if everything is up and running:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ob_dbfs

               OFFLINE OFFLINE      zeroserv01               STABLE

               OFFLINE OFFLINE      zeroserv02               STABLE

ora.ASMNET1LSNR_ASM.lsnr

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.ASMNET2LSNR_ASM.lsnr

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.CATALOG.RAADMIN.advm

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.CATALOG.RAOSBADMIN.advm

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.CATALOG.dg

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.DELTA.dg

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.LISTENER.lsnr

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.catalog.raadmin.acfs

               ONLINE  ONLINE       zeroserv01               mounted on /raacfs/r

                                                             aadmin,STABLE

               ONLINE  ONLINE       zeroserv02               mounted on /raacfs/r

                                                             aadmin,STABLE

ora.catalog.raosbadmin.acfs

               ONLINE  ONLINE       zeroserv01               mounted on /osbcat,S

                                                             TABLE

               ONLINE  ONLINE       zeroserv02               mounted on /osbcat,S

                                                             TABLE

ora.chad

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.net1.network

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.ons

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.proxy_advm

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

rep_dbfs

               OFFLINE OFFLINE      zeroserv01               STABLE

               OFFLINE OFFLINE      zeroserv02               STABLE

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.LISTENER_SCAN2.lsnr

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.LISTENER_SCAN3.lsnr

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.MGMTLSNR

      1        ONLINE  ONLINE       zeroserv01               192.168.10.1 192.168

                                                             .10.2,STABLE

ora.asm

      1        ONLINE  ONLINE       zeroserv01               Started,STABLE

      2        ONLINE  ONLINE       zeroserv02               Started,STABLE

ora.cvu

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.mgmtdb

      1        ONLINE  ONLINE       zeroserv01               Open,STABLE

ora.qosmserver

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.scan1.vip

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.scan2.vip

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.scan3.vip

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.zdlras.db

      1        OFFLINE OFFLINE                               STABLE

      2        OFFLINE OFFLINE                               Instance Shutdown,ST

                                                             ABLE

ora.zeroserv01.vip

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.zeroserv02.vip

      1        ONLINE  ONLINE       zeroserv02               STABLE

--------------------------------------------------------------------------------

[root@zeroserv01 ~]#

As you can see, everything is up. ASM is online in both nodes, as the listeners for ASM too.

9 – Start Databases

To finish, we need to restart all databases that were stopped at topic 1.

[oracle@zeroserv01 ~]$ srvctl start database -d zdlras

[oracle@zeroserv01 ~]$

Clean up

After you complete the GI upgrade for 19c, you can remove the ASM Proxy init file (from $GI_HOME/dbs) that was created before. It is not needed anymore because the upgrade removes completely the HAIP from CRS because it is not in use.

Conclusion

As told before, HAIP is not supported by the Exadata environment. If you check the definition for HAIP and the hardware design for the Exadata/Engineering System, it is completely redundant. The InfiniBand network and the dual-channel/ports have the same effect (even better)

But unfortunately, HAIP can be up for several reasons (like ZDLRA) and because of the way that ASM Proxy starts and pick up the interconnect IP, HAIP can be selected. Remove HAIP it is a little tricky because of the ASM dependency, we need to set some specific parameters.

As you saw before, there are several steps. But you can use and adapt them to your environment. If it is Exadata/Engineering System (like ZDLRA) they can be used almost as-is (just check the IPs). If it is not, read and adapt.

References:

EXADATA X8M, Workshop

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

EXADATA X8M, Workshop

When Exadata X8M was released during the last Open World I made one post about the technical details about it. You can check it here: Exadata X8M (this post received some good shares and reviews). If you read it, you can see that I focused in more internal details (like torn blocks, one side path, two sides read/writes, and others), that differ from the normal analyses for Exadata X8M.

But recently I was invited by Oracle to participate exclusive workshop about Exadata X8M and I needed to share some details that picked me up. The workshop was done directly from Oracle Solution Center in Santa Clara Campus (it is an amazing place that I had an opportunity to visit in 2015, and have a rich history – if you have the opportunity, visit), and cover some technical details and with the hands-on part.

Unfortunately, I can’t share everything (I even don’t know if I can share something), but see the info below.

Exadata X8M

Little briefing before. If you don’t know, the big change for Exadata X8M was the add of RDMA memory directly at the storage server. Basically, added Intel Optane Memory (that it is nonvolatile by the way), in front of everything (including flashcache). Called PMEM (Persistent memory) as Exadata.

You can see more detail about Exadata at “Exadata Technical Deep Dive: Architecture and Internals” available at Oracle Web Site.

And just to add, the database server can access directly it, without passing operational system caches in the middle since the RDMA is ZDP (Zero-loss Zero-Copy Datagram Protocol). The data goes directly from database server memory to storage server memory. And if you think that now it uses RCoE (100GB/s network), it is fast.

Workshop

As told the workshop was more than a simple marketing presentation and had hands-on part. The focus was to show the PMEM gains over a “traditional” Exadata. Gains over log writes (PMEMLog) and gains over cache for reads (PMEMCache).

The test executed some loads (to test the log writes) and reads (to test the cache) for some defined period. Two runs were made, one with features disabled and other with enabled. And after that, we can compare the results.

Just to be clear that the focus of the tests was not to stress the machine and force to reach the limits. The focus was to see the difference between executions with and without PMEMCache and PMEMLog enabled. So, the numbers that you see here are below the limits for the Exadata.

Environment

Just to be more precise, the Workshop runs over Exadata X8M Extreme Flash Edition. This means that the disk was the flash driver. So, here we are comparing Flash against Memory. You will see the numbers below, but when read it remember that is flash, and think that they will be even slower in “normal” environment with disks instead of flash. Check the datasheet for more info.

For PMEM, the Exadata had 1.5TB of Intel Optane Memory. Twelve modules attached in each storage:

PMEMLog

The PMEMlog works in the same way that flashlog. It will speed up the redo writes that came from LGWR. For the workshop was first executed one run without PMEMLog, and the numbers were:

Execution per second: 26566
Log file sync average time: 310.00 μs
Log file parallel write average time: 129.00 μs

At CloudControl (CC):

And the same run was made with PMEMLog enabled. The results were:

Execution per second: 73915
Log file sync average time: 56.00 μs
Log file parallel write average time: 14.00 μs

And for CC:

As you can see, the difference was:

7x more executions.
2x faster to do log switch.
2x faster to do log writes.

As you can see, good numbers and good savings. Note that the actual performance measured may vary based on your workload. As expected, the use of PMEM speed up the redologs writes executed over the same workload (the database was restarted between the runs).

PMEM Cache

PMEM Cache works as a cache in front of the flashcache for Exadata Software, in the same way that flashcache but using just the PMEM memory modules. There nothing much to explain, everything is controlled by Exadata Software (the blocks that will be at PMEMCache), but the results are pretty amazing.

During the workshop one run was executed without PMEMCache enabled, the result was:

Number of read IOPS: 125.688
Average single block read latency: 233.00 μs
Average MB/s: 500MB/s

And form CC view:

And after, the run with PMEMCache enabled:

Number of read IOPS: 020.731
Average single block read latency (us): 16.00 μs
Average MB/s: 4000 MB/s (4GB/s)

For CC you can see how huge the difference:

The test was the same over the database. Both runs were the same. And if you compare the results, with PMEMCache enable was:

8x time more IOPS.
14x time faster/less latency.
8x more throughput at MB/s.

And again, remember that the Exadata was an EF edition. So, we are comparing flash against memory. Think how huge the difference will be if we compare “normal” hard disk against memory.

Under the Hood

But under the hood, how the PMEM appear? It is simple (and tricky at the same time). The modules appear as normal celldisk for the cell (yes, dimm memory as celldisk):

CellCLI> list celldisk

          FD_00_exa8cel01 normal

          FD_01_exa8cel01 normal

          FD_02_exa8cel01 normal

          FD_03_exa8cel01 normal

          FD_04_exa8cel01 normal

          FD_05_exa8cel01 normal

          FD_06_exa8cel01 normal

          FD_07_exa8cel01 normal

          PM_00_exa8cel01 normal

          PM_01_exa8cel01 normal

          PM_02_exa8cel01 normal

          PM_03_exa8cel01 normal

          PM_04_exa8cel01 normal

          PM_05_exa8cel01 normal

          PM_06_exa8cel01 normal

          PM_07_exa8cel01 normal

          PM_08_exa8cel01 normal

          PM_09_exa8cel01 normal

          PM_10_exa8cel01 normal

          PM_11_exa8cel01 normal




CellCLI>

And the PMEMLog in the same way as flashlog:

CellCLI> list pmemlog detail

          name:                   exa8cel01_PMEMLOG

          cellDisk:               PM_11_exa8cel01,PM_06_exa8cel01,PM_01_exa8cel01,PM_10_exa8cel01,PM_03_exa8cel01,PM_04_exa8cel01,PM_07_exa8cel01,PM_08_exa8cel01,PM_09_exa8cel01,PM_05_exa8cel01,PM_00_exa8cel01,PM_02_exa8cel01

          creationTime:           2020-01-31T21:01:06-08:00

          degradedCelldisks:     

          effectiveSize:          960M

          efficiency:             100.0

          id:                     9ba61418-e8cc-43c6-ba55-c74e4b5bdec8

          size:                   960M

          status:                 normal




CellCLI>

And PMEMCache too:

CellCLI> list pmemcache detail

          name:                   exa8cel01_PMEMCACHE

          cellDisk:               PM_04_exa8cel01,PM_01_exa8cel01,PM_09_exa8cel01,PM_03_exa8cel01,PM_06_exa8cel01,PM_08_exa8cel01,PM_05_exa8cel01,PM_10_exa8cel01,PM_11_exa8cel01,PM_00_exa8cel01,PM_02_exa8cel01,PM_07_exa8cel01

          creationTime:           2020-01-31T21:23:09-08:00

          degradedCelldisks:     

          effectiveCacheSize:     1.474365234375T

          id:                     d3c71ce8-9e4e-4777-9015-771a4e1a2376

          size:                   1.474365234375T

          status:                 normal




CellCLI>

So, the administration at the cell side will be easy. We (as DMA) don’t need to concern about special administration details. It was not the point of the workshop, but I think that when the DIMM needs to be replaced, it will not be hot-swapped and after the change, some “warm” process needs to occur to load the cache.

Conclusion

For me, the workshop was a really nice surprise. I work with Exadata since 2010, started with Exadata V2 and passed over X2, X4 X5 EF and X7, and saw a lot of new features over these 10 years, but the addition of PMEM was a good adding. I was not expecting much difference for the numbers, but it was possible to see the real difference that PMEM can deliver. The notable point was the latency reduction.

Think about one environment that you need to be fast, these μs make the difference. Think in one DB that used Data Guard with Real-time apply for redo, where the primary waits for standby apply to release the commit. Think that these μs that you gain, you deduct for the total time.

Of course, that not everyone maybe needs the gains for PMEM, but for those that need it really make the difference.

« 1 2 3 4 … 7 »

ZDLRA, Configuring Replication Network

Is common that our systems grow with time, and the environment that sustains it needs to improve. And the same occurs for ZDLRA. Imagine that now you added a new datacenter and bought a new ZDLRA and want to replicate between them, or that now you want to enable the replication, configuring it.

This is possible and is not complicated to do, and I will show here how to do that. So, in this post, I will show how to configure the replication network for ZDLRA that was already deployed. Basically a post-install procedure.

Replication network

In my previous post, I already explain the basics of how the replication works for ZDLRA. And going deep about the replication, the ZDLRA have a physical dedicated network to replicate:

And according to documentation:

“The optional replication network connects the local Recovery Appliance (the upstream appliance) with a remote Recovery Appliance (the downstream appliance). Oracle recommends a broadband, encrypted network, instead of an insecure public network, wherever possible.”

It is optional because you can share the same “Ingest Network” to receive replication, but is not recommended. But whatever the network mode that you choose the “Replication Network” is a different subnet that not overlaps/is part of the “Ingest Network”.

ZDLRA Environment

In this post I have two ZDLRA’s to enable the replication. But this can vary in your example. So, I will show how to do in both nodes, in both ZDLRA’s (will be 4 to do).

So, use this as a guide on how to do, but remember to adapt to your environment. Ip’s, physical networks (bond or no), and routes are an example of details that will change. I will focus on how to configure the GI and ZDLRA in this post.

And the most important, I recommend opening a proactive SR at MOS informing what are you doing and asking doubts before the start. A good start point is a note “Post Install – Replication Network Configuration for ZDLRA (Doc ID 2126047.1)” at MOS.

Basic network config

The network configuration follows the same as normal Linux network configuration: you need to define the separated ip’s for both ZDLRA’s, if you will use bond, or the route table. Since this depends on every environment I will not cover here.

But basically:

1. Hostname: You need to choose at least: 1 Hostname at your replication network for each node, 1 Hostname for VIP for each node

2. Scan Hostname: 1 Hostname to be used as a scan for replication data exchange. The scan will be used for each ZDLRA that you are configuring.

3. Configure network files: Configure the ifcfg files. The files ifcfg-eth2, ifcfg-eth3 and ifcfg-bondeth1 if you will use bond. If just ifcfg-eth2 if you will use a single connection. This is done in all nodes of the ZDLRA appliance.

4. Route configuration: You need to guarantee that access made by the replication network is not routed through the normal ingest network. This depends on the way that you have your network architecture, but maybe you need to change the files “route-*” in each node.

My current system is:

ZDLRAS1: ZDLRA installed on site 1, have two nodes: zdlras1n1 and zdlras1n2. It will be upstream (who send backups).

ZDLRAS2: ZDLRA installed on site 2, have two nodes: zdlras2n1 and zdlras2n2. It will be downstream (who received backup).

What I will add:

ZDLRAS1: I will add: zdlras1n1-rvip (200.254.255.21) as replication vip for zdlras1n1, zdlras1n2-rvip (200.254.255.22) as replication vip for zdlras1n2. Scan zdlras1-rep.

ZDLRAS2: I will add: zdlras1n1-rvip (200.254.255.23) as replication vip for zdlras2n1, zdlras2n2-rvip (200.254.255.24) as replication vip for zdlras2n2. Scan zdlras2-rep

GI Configuration

The next step after the basic configuration is configure the GI to add the network, vip, and scan.

Checking interfaces

After you configure the Linux part and the network basic configuration the GI can be check if the new interface can be used by GI. So, the first step is to check this in both nodes in both ZDLRA’s with command “oifcfg iflist” (I cropped below to show just what is needed):

Add network

Since the interfaces are visible and not used, we can add the network number 2 at GI level. This is done in just one node per ZDLRA. TO do that we use the command “srvctl add network” as root user in both ZDLRA’s that we are configuring:

Look above that we are using the bondeth1 as the interface for this network (the bondeth0 if for ingest network). And I tested if the network is available or no (need to be unused).

Add VIP

After we can add the VIP for each node for each ZDLRA that we are configuring (done using “srvctl add vip”):

Check that I defined the hostname for each vip for each node and the parameter “k” that defined the network where this vip will be created. Be careful with the “n” parameter that defined the node name.

Add Scan

After add vip we can add scan for each ZDLRA cluster with “srvctl add scan”:

Be careful to use the connect “netnum” parameter value (needs to point to the second network) and scan name.

Start Vip

After the scan we can start the vips for each node in each ZDLRA:

Create Listener

The last step for GI configuration is to create and start “listener” and “scan listener” in each ZDLRA cluster:

The important detail here:

Port: it is 1522. Look at the “endpoint” parameter.

Network: Check that listener is added ate network #2 (parameter “k”).

Listener name: we use LISTENER_REPL to follow the ZDLRA default config.

User: All the commands are run with oracle

All the commands are executed in just one node of the cluster because they affect all nodes at the same time.

With that now with have the network configured in each node of both ZDLRA’s, all the vips and scan up and running, and also the listening to this new network.

ZDLRA Configuration

Since we have all nodes with everything configured, we can add this network ad ZDLRA config. The idea for ZDLRA is to allow it to receive and send backups trough this new network, and this is incredibly easy.

We need to change just two parameters at ZDLRA config tables: REPLICATION_IP_ADDRESS and BACKUP_IP_ADDRESS. The configuration change internal ZDLRA tables, so, before doing that review and check with Oracle Support at MOS if everything is OK and you can proceed.

REPLICATION_IP_ADDRESS

This configuration resides at the intenal “rasys.host” table for ZDLRA. So, we update the table column:

As you can see above, the REPLICATION_IP_ADDRESS for each node reflects the IP at the replication network.

BACKUP_IP_ADDRESS

This configuration defines with IP’s the ZDLRA allows ingesting backups. Since the replication basically is a ingest coming from other network, we need to allow it. To do that we change the table internal table rai_host. And, of course, we can commit if everything is OK.

After that, I recommend to reboot, ate least ZDLRA database to reload the configs. Again, remember that the values here (ip, networks, interfaces, and routes) will differ for you’re your environment. Use as a guide.

Replication

The replication for ZDLRA have some steps to be done. If we do this during the deployment is easier (since is done automatically by installer). But if we need to do after (because adding or preparing the environment) is not complicate, but we need to be aware of some details.

And about ZDLRA internal change, they occur at internal tables. So, always review correctly the values to avoid errors (before the commit). After that, we can use the DBMS_RA.CREATE_REPLICATION_SERVER and create the replication server config, I will show this in another post.

ZDLRA, Replication

The architecture

The architecture for ZDLRA replication it is simple. There are two important definitions:

Upstream: It is the ZDLRA that receives the backup and forward it to another ZDLRA

Downstream: Is the ZDLRA that receives the backup from another ZDLRA

Basically it is this:

And the configuration can be:

One-Way: The data flows in one way only, only one ZDLRA forwards the backups.

Bi-Directional: Both ZDLRA’s send backups to each other. Is this case, the protected databases for each ZDLRA (usually one at the separated datacenter) are replicated between them since both operated as upstream/downstream.

Hub-Spoke: One ZDLRA receives backups from several ZDLRA’s. And this “third” ZDLRA is responsible to archive to tape.

Is more or less like the picture below:

One important detail is that every ZDLRA can clone backups to tape. Is not just for Hub-Spoke design.

For network connections, it is possible to dedicate one network port just for replication to avoid concurrent usage. But is also possible to share the same physical interface to receive backup ingest too. Whatever the chosen mode, the network for replication is a different subnet.

Replication and Index

I already wrote about some details from the internals of ZDLRA, the Virtual Full Backup, what you need to understand it, and also about INDEX_BACKUP (here and here). And if you check the posts, you already understood that ZDLRA “sees” the rman backup in a different way.

But for replication, some details are important to hint here. The replication is done for (and just for) every backupset that is ingested, so, the virtual full backup is not replication. On the other hand, every downstream (ZDLRA that receives backup) constructs the virtual full backup.

The usage

The replication for ZDLRA is more than just sending backups from one side to another. Again, ZDLRA is more than just reduce backup load, it is a pillar of your architecture. It is a key part of MAA architecture.

For DG protected databases: Backup is not replicated between ZDLRA’s because each side has its own backup. Replication is done by DG.