Fernando Simon (Board Member) – LUXOUG – LUXEMBOURG ORACLE USERS GROUP

Fernando Simon (Board Member)

ZDLRA, Creating the Replication Server

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

ZDLRA, Creating the Replication Server

The replication for ZDLRA operates in several ways, from a single upstream/downstream config to a multiple replication config, but both are done using the same procedure. The process is not complicated but has some details that are needed to be aware to avoid reconstruct (or even loss) replicated data. In this post, I will show the details to create the replication config.

The base about how the replication works for ZDLRA I wrote in this post. And how to configure the replication network config in this other post. This network configuration needs to be done just when you are adding the replication after the ZDLRA has been deployed, if you already deployed with replication enabled it is not needed. The official documentation about replication can be found here.

Replication Topology

The topology for ZDLRA replication can vary, but basic is:

And the resume is:

1 . One-Way: The data flows in one way only, only one ZDLRA forwards the backups.

2. Bi-Directional: Both ZDLRA’s send backups to each other. Is this case, the protected databases for each ZDLRA (usually one at the separated datacenter) are replicated between them since both operated as upstream/downstream.

3. Hub-Spoke: One ZDLRA receives backups from several ZDLRA’s. And this “third” ZDLRA is responsible to archive to tape.

In any type of replication that you have exists:

1. Upstream: It is the ZDLRA that receives the backup and forward it to another ZDLRA

2. Downstream: Is the ZDLRA that receives the backup from another ZDLRA

Scenario

In this post (and others when I use the replication) I will use the “One-Way config” where I have one upstream and one downstream. But if you have other types, you just need to follow the same procedure and take care of details like user, wallets, and credentials.

It will be:

1. Upstream: ZDLRAS1.

2. Downstream: ZDLRAS2.

Creating the Replication

The replication for ZDLRA operates differently than Oracle DG, it is native replication using similar procedure than ingest backup at ZDLRA. I already wrote about this in my previous post (Replication and Index topic).

To configure the replication we use the procedure DBMS_RA.CREATE_REPLICATION_SERVER but before we need to check some details. The replication is done based on protection policy, so, all the databases linked with that will have the backups replicated. I will write about this in another post, in this post I will show how to create the replication config.

A user at downstream to receive replication

The ZDLRA replication requires that you use one specific user to send the backups from upstream to downstream. This user is created just in the downstream ZDLRA and never needs to be used to connect using rman.

The form/best practices to create user is REPUSER_FROM_[ZDLRA_UPSTREAM_DB_NAME]. Doing this you know the source of connection (when your downstream receives backup from more than one upstream).

So, the first step is to create the user at downstream:

[root@zdlras2n1 ~]# /opt/oracle.RecoveryAppliance/bin/racli add vpc_user --user_name=repusr_from_zdlras1

[repusr_from_zdlras1] New Password:

Mon Nov 25 23:34:50 2019: Start: Add vpc user repusr_from_zdlras1.

Mon Nov 25 23:34:51 2019:        Add vpc user repusr_from_zdlras1 successfully.

Mon Nov 25 23:34:51 2019: End:   Add vpc user repusr_from_zdlras1.

[root@zdlras2n1 ~]#

Wallet at upstream

To allow the upstream connect at downstream to send the backup is needed to create one wallet at upstream ZDLRA with credentials from the repuser create in the first step. The wallet can be stored in one shared filesystem to allow both nodes of the cluster to access is, or each node can store at one folder (but path needs to be the same in both).

The wallet needs to be ALO (auto-login) and can be shared (if you have one). To create the wallet at upstream we need to do:

[root@zdlras1n1 ~]# su - oracle

Last login: Mon Nov 25 23:43:26 CET 2019 on pts/3

[oracle@zdlras1n1 ~]$ mkdir /radump/wallrep

[oracle@zdlras1n1 ~]$

[oracle@zdlras1n1 ~]$ mkstore -wrl /radump/wallrep -createALO

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.3.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.




[oracle@zdlras1n1 ~]$

And after that, we create the credential with username and password that was created at downstream:

[oracle@zdlras1n1 ~]$ mkstore -wrl /radump/wallrep -createCredential zdlras2-rep.oralocal:1522/zdlras2 repusr_from_zdlras1 repuser

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.3.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.




[oracle@zdlras1n1 ~]$

[oracle@zdlras1n1 ~]$ mkstore -wrl /radump/wallrep -listCredential

Oracle Secret Store Tool Release 19.0.0.0.0 - Production

Version 19.3.0.0.0

Copyright (c) 2004, 2019, Oracle and/or its affiliates. All rights reserved.




List credential (index: connect_string username)

1: zdlras2-rep.oralocal:1522/zdlras2 repusr_from_zdlras1

[oracle@zdlras1n1 ~]$

The credential name you can define, but I usually specify it with the same pattern as EZCONNECT. Doing this, I directly know where this credential is.

DBMS_RA.CREATE_REPLICATION_SERVER

The third and last step to create the replication is to call the procedure to create the configuration at upstream. This is done just at upstream and it uses the wallet create at step two.

So, we use the DBMS_RA.CREATE_REPLICATION_SERVER with parameters:

. replication_server_name: Name for the downstream server. You can define the name that you want.

sbt_so_name: It will be always “libra.so”.

. catalog_user_name: Is the user that will connect using the wallet. Always RASYS.

. wallet_alias: The credential name that you defined what wallet.

. wallet_path: Where the wallet is located.

. max_streams: Max number of concurrent replication streams. The default value is 4.

The replication information can be checked at RASYS.RA_REPLICATION_SERVER tables that store all the information for replicated servers at your upstream.

So, to create the replication configuration:

[oracle@zdlras1n1 ~]$ sqlplus rasys/change^Me2




SQL*Plus: Release 19.0.0.0.0 - Production on Sun Dec 22 20:46:51 2019

Version 19.3.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.




Last Successful login time: Sun Dec 22 2019 20:33:15 +01:00




Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.3.0.0.0




SQL> SELECT COUNT(*)  FROM RA_REPLICATION_SERVER;




  COUNT(*)

----------

         0




SQL>

SQL> BEGIN

  2  DBMS_RA.CREATE_REPLICATION_SERVER (

  3      replication_server_name => 'zdlras2_rep',

  4      sbt_so_name      => 'libra.so',

  5      catalog_user_name       => 'RASYS',

  6      wallet_alias            => 'zdlras2-rep.oralocal:1522/zdlras2',

  7      wallet_path             => 'file:/radump/wallrep');

  8  END;

  9  /




PL/SQL procedure successfully completed.




SQL> SELECT COUNT(*)  FROM RA_REPLICATION_SERVER;




  COUNT(*)

----------

         1




SQL>

One important point here is the “max_streams” parameter. It needs to be tuned, if you are replicating more databases, maybe is good to increase this value. You can check the queue just select the “rasys.ra_task” table and verify if there are waiting for tasks for replication. Of course, this depends on the size of your files too.

Replication

The steps described here are just the small part for replication. We just created the replication server config at upstream (wallets and information) and downstream (username). But we still not finish the configuration for replication workflow:

And if you check the workflow for manual config we still need to do some steps:

But the missing past is related to “logical” definition, like policies that will be replicated and databases that are linked with these policies. The basic configuration (replication server config) was done in this post, and at previous posts.

At next post will show how to configure the backup policies and the details that you need to take care to correctly define it. If you want to understand more about protection policies you can check the post that I made about it.

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community.”

ZDLRA, Configuring Replication Network

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

ZDLRA, Configuring Replication Network

Is common that our systems grow with time, and the environment that sustains it needs to improve. And the same occurs for ZDLRA. Imagine that now you added a new datacenter and bought a new ZDLRA and want to replicate between them, or that now you want to enable the replication, configuring it.

This is possible and is not complicated to do, and I will show here how to do that. So, in this post, I will show how to configure the replication network for ZDLRA that was already deployed. Basically a post-install procedure.

Replication network

In my previous post, I already explain the basics of how the replication works for ZDLRA. And going deep about the replication, the ZDLRA have a physical dedicated network to replicate:

And according to documentation:

“The optional replication network connects the local Recovery Appliance (the upstream appliance) with a remote Recovery Appliance (the downstream appliance). Oracle recommends a broadband, encrypted network, instead of an insecure public network, wherever possible.”

It is optional because you can share the same “Ingest Network” to receive replication, but is not recommended. But whatever the network mode that you choose the “Replication Network” is a different subnet that not overlaps/is part of the “Ingest Network”.

ZDLRA Environment

In this post I have two ZDLRA’s to enable the replication. But this can vary in your example. So, I will show how to do in both nodes, in both ZDLRA’s (will be 4 to do).

So, use this as a guide on how to do, but remember to adapt to your environment. Ip’s, physical networks (bond or no), and routes are an example of details that will change. I will focus on how to configure the GI and ZDLRA in this post.

And the most important, I recommend opening a proactive SR at MOS informing what are you doing and asking doubts before the start. A good start point is a note “Post Install – Replication Network Configuration for ZDLRA (Doc ID 2126047.1)” at MOS.

Basic network config

The network configuration follows the same as normal Linux network configuration: you need to define the separated ip’s for both ZDLRA’s, if you will use bond, or the route table. Since this depends on every environment I will not cover here.

But basically:

1. Hostname: You need to choose at least: 1 Hostname at your replication network for each node, 1 Hostname for VIP for each node

2. Scan Hostname: 1 Hostname to be used as a scan for replication data exchange. The scan will be used for each ZDLRA that you are configuring.

3. Configure network files: Configure the ifcfg files. The files ifcfg-eth2, ifcfg-eth3 and ifcfg-bondeth1 if you will use bond. If just ifcfg-eth2 if you will use a single connection. This is done in all nodes of the ZDLRA appliance.

4. Route configuration: You need to guarantee that access made by the replication network is not routed through the normal ingest network. This depends on the way that you have your network architecture, but maybe you need to change the files “route-*” in each node.

My current system is:

ZDLRAS1: ZDLRA installed on site 1, have two nodes: zdlras1n1 and zdlras1n2. It will be upstream (who send backups).

ZDLRAS2: ZDLRA installed on site 2, have two nodes: zdlras2n1 and zdlras2n2. It will be downstream (who received backup).

What I will add:

ZDLRAS1: I will add: zdlras1n1-rvip (200.254.255.21) as replication vip for zdlras1n1, zdlras1n2-rvip (200.254.255.22) as replication vip for zdlras1n2. Scan zdlras1-rep.

ZDLRAS2: I will add: zdlras1n1-rvip (200.254.255.23) as replication vip for zdlras2n1, zdlras2n2-rvip (200.254.255.24) as replication vip for zdlras2n2. Scan zdlras2-rep

GI Configuration

The next step after the basic configuration is configure the GI to add the network, vip, and scan.

Checking interfaces

After you configure the Linux part and the network basic configuration the GI can be check if the new interface can be used by GI. So, the first step is to check this in both nodes in both ZDLRA’s with command “oifcfg iflist” (I cropped below to show just what is needed):

############## Upstream ZDLRA

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/oifcfg iflist -p -n

…

bondeth1  200.254.255.0  UNKNOWN  255.255.255.0

…

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]#  /u01/app/19.0.0.0/grid/bin/oifcfg iflist -p -n

…

bondeth1  200.254.255.0  UNKNOWN  255.255.255.0

…

[root@zdlras2n1 ~]#

Add network

Since the interfaces are visible and not used, we can add the network number 2 at GI level. This is done in just one node per ZDLRA. TO do that we use the command “srvctl add network” as root user in both ZDLRA’s that we are configuring:

############## Upstream ZDLRA

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl config network -k 2

PRCR-1001 : Resource ora.net2.network does not exist

[root@zdlras1n1 ~]#

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add network -netnum 2 -subnet 200.254.255.0/255.255.255.0/bondeth1

[root@zdlras1n1 ~]#

[root@zdlras1n1 ~]# srvctl config network -k 2

Network 2 exists

Subnet IPv4: 200.254.255.0/255.255.255.0/bondeth1, static

Subnet IPv6:

Ping Targets:

Network is enabled

Network is individually enabled on nodes:

Network is individually disabled on nodes:

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl config network -k 2

PRCR-1001 : Resource ora.net2.network does not exist

[root@zdlras2n1 ~]#

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add network -netnum 2 -subnet 200.254.255.0/255.255.255.0/bondeth1

[root@zdlras2n1 ~]#

[root@zdlras2n1 ~]# srvctl config network -k 2

Network 2 exists

Subnet IPv4: 200.254.255.0/255.255.255.0/bondeth1, static

Subnet IPv6:

Ping Targets:

Network is enabled

Network is individually enabled on nodes:

Network is individually disabled on nodes:

[root@zdlras2n1 ~]#

Look above that we are using the bondeth1 as the interface for this network (the bondeth0 if for ingest network). And I tested if the network is available or no (need to be unused).

Add VIP

After we can add the VIP for each node for each ZDLRA that we are configuring (done using “srvctl add vip”):

############## Upstream ZDLRA

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add vip -n zdlras1n1 -A zdlras1n1-rvip.oralocal/255.255.255.0/bondeth1 -k 2

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add vip -n zdlras1n2 -A zdlras1n2-rvip.oralocal/255.255.255.0/bondeth1 -k 2

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add vip -n zdlras2n1 -A zdlras2n1-rvip.oralocal/255.255.255.0/bondeth1 -k 2

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add vip -n zdlras2n2 -A zdlras2n2-rvip.oralocal/255.255.255.0/bondeth1 -k 2

[root@zdlras2n1 ~]#

Check that I defined the hostname for each vip for each node and the parameter “k” that defined the network where this vip will be created. Be careful with the “n” parameter that defined the node name.

Add Scan

After add vip we can add scan for each ZDLRA cluster with “srvctl add scan”:

############## Upstream ZDLRA

[root@zdlras1n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add scan -netnum 2 -scanname zdlras1-rep.oralocal

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]# /u01/app/19.0.0.0/grid/bin/srvctl add scan -netnum 2 -scanname zdlras2-rep.oralocal

[root@zdlras2n1 ~]#

Be careful to use the connect “netnum” parameter value (needs to point to the second network) and scan name.

Start Vip

After the scan we can start the vips for each node in each ZDLRA:

############## Upstream ZDLRA

[root@zdlras1n1 ~]# srvctl status vip -i zdlras1n1-rvip.oralocal

VIP 200.254.255.21 is enabled

VIP 200.254.255.21 is running on node: zdlras1n1

[root@zdlras1n1 ~]#

[root@zdlras1n1 ~]# srvctl status vip -i zdlras1n2-rvip.oralocal

VIP 200.254.255.22 is enabled

VIP 200.254.255.22 is running on node: zdlras1n2

[root@zdlras1n1 ~]#




############## Downstream ZDLRA

[root@zdlras2n1 ~]# srvctl status vip -i zdlras2n1-rvip.oralocal

VIP 200.254.255.23 is enabled

VIP 200.254.255.23 is running on node: zdlras2n1

[root@zdlras2n1 ~]#

[root@zdlras2n1 ~]# srvctl status vip -i zdlras2n2-rvip.oralocal

VIP 200.254.255.24 is enabled

VIP 200.254.255.24 is running on node: zdlras2n2

[root@zdlras2n1 ~]#

Create Listener

The last step for GI configuration is to create and start “listener” and “scan listener” in each ZDLRA cluster:

############## Upstream ZDLRA

[oracle@zdlras1n1 ~]$ export ORACLE_HOME=/u01/app/19.0.0.0/grid

[oracle@zdlras1n1 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zdlras1n1 ~]$  

[oracle@zdlras1n1 ~]$ srvctl add listener -l LISTENER_REPL -p 1522 -k 2

[oracle@zdlras1n1 ~]$ srvctl start listener -l LISTENER_REPL

[oracle@zdlras1n1 ~]$ srvctl status listener -l LISTENER_REPL

Listener LISTENER_REPL is enabled

Listener LISTENER_REPL is running on node(s): zdlras1n1,zdlras1n2

[oracle@zdlras1n1 ~]$

[oracle@zdlras1n1 ~]$ srvctl add scan_listener -netnum 2 -listener LISTENER_REPL -endpoints TCP:1522

[oracle@zdlras1n1 ~]$ srvctl start scan_listener -netnum 2

[oracle@zdlras1n1 ~]$ srvctl status scan_listener -netnum 2

SCAN Listener LISTENER_REPL_SCAN1_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN1_NET2 is running on node zdlras1n1

SCAN Listener LISTENER_REPL_SCAN2_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN2_NET2 is running on node zdlras1n2

SCAN Listener LISTENER_REPL_SCAN3_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN3_NET2 is running on node zdlras1n1

[oracle@zdlras1n1 ~]$




############## Downstream ZDLRA

[root@zdlras2n1 ~]# su - oracle

Last login: Sat Nov 23 21:28:03 CET 2019

[oracle@zdlras2n1 ~]$ export ORACLE_HOME=/u01/app/19.0.0.0/grid

[oracle@zdlras2n1 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zdlras2n1 ~]$

[oracle@zdlras2n1 ~]$ srvctl add listener -l LISTENER_REPL -p 1522 -k 2

[oracle@zdlras2n1 ~]$ srvctl start listener -l LISTENER_REPL

[oracle@zdlras2n1 ~]$ srvctl status listener -l LISTENER_REPL

Listener LISTENER_REPL is enabled

Listener LISTENER_REPL is running on node(s): zdlras2n1,zdlras2n2

[oracle@zdlras2n1 ~]$

[oracle@zdlras2n1 ~]$

[oracle@zdlras2n1 ~]$ srvctl add scan_listener -netnum 2 -listener LISTENER_REPL -endpoints TCP:1522

[oracle@zdlras2n1 ~]$ srvctl start scan_listener -netnum 2

[oracle@zdlras2n1 ~]$ srvctl status scan_listener -netnum 2

SCAN Listener LISTENER_REPL_SCAN1_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN1_NET2 is running on node zdlras2n1

SCAN Listener LISTENER_REPL_SCAN2_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN2_NET2 is running on node zdlras2n2

SCAN Listener LISTENER_REPL_SCAN3_NET2 is enabled

SCAN listener LISTENER_REPL_SCAN3_NET2 is running on node zdlras2n1

[oracle@zdlras2n1 ~]$

[oracle@zdlras2n1 ~]$

The important detail here:

Port: it is 1522. Look at the “endpoint” parameter.
Network: Check that listener is added ate network #2 (parameter “k”).
Listener name: we use LISTENER_REPL to follow the ZDLRA default config.
User: All the commands are run with oracle

All the commands are executed in just one node of the cluster because they affect all nodes at the same time.

With that now with have the network configured in each node of both ZDLRA’s, all the vips and scan up and running, and also the listening to this new network.

ZDLRA Configuration

Since we have all nodes with everything configured, we can add this network ad ZDLRA config. The idea for ZDLRA is to allow it to receive and send backups trough this new network, and this is incredibly easy.

We need to change just two parameters at ZDLRA config tables: REPLICATION_IP_ADDRESS and BACKUP_IP_ADDRESS. The configuration change internal ZDLRA tables, so, before doing that review and check with Oracle Support at MOS if everything is OK and you can proceed.

REPLICATION_IP_ADDRESS

This configuration resides at the intenal “rasys.host” table for ZDLRA. So, we update the table column:

############## Upstream ZDLRA

[oracle@zdlras1n1 ~]$ sqlplus rasys/change^Me2




SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 23 23:22:09 2019

Version 19.3.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.




Last Successful login time: Sat Nov 23 2019 21:33:26 +01:00




Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.3.0.0.0




SQL> set linesize 250

SQL>

SQL> col node_name format a50

SQL> col REPLICATION_IP_ADDRESS format a50

SQL> select node_name,replication_ip_address from host;




NODE_NAME                                          REPLICATION_IP_ADDRESS

-------------------------------------------------- --------------------------------------------------

zdlras1n1.oralocal

zdlras1n2.oralocal




SQL> update HOST set REPLICATION_IP_ADDRESS='200.254.255.21' where NODE_NAME = 'zdlras1n1.oralocal';




1 row updated.




SQL> update HOST set REPLICATION_IP_ADDRESS='200.254.255.22' where NODE_NAME = 'zdlras1n2.oralocal';




1 row updated.




SQL> select node_name,replication_ip_address from host;




NODE_NAME                                          REPLICATION_IP_ADDRESS

-------------------------------------------------- --------------------------------------------------

zdlras1n1.oralocal                                 200.254.255.21

zdlras1n2.oralocal                                 200.254.255.22




SQL>




############## Downstream ZDLRA

[oracle@zdlras2n1 ~]$ sqlplus rasys/change^Me2




SQL*Plus: Release 19.0.0.0.0 - Production on Sat Nov 23 23:39:13 2019

Version 19.3.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.




Last Successful login time: Sat Nov 23 2019 19:54:17 +01:00




Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.3.0.0.0




SQL> set linesize 250

SQL> col node_name format a50

SQL> col REPLICATION_IP_ADDRESS format a50

SQL>

SQL> select node_name,replication_ip_address from host;




NODE_NAME                                          REPLICATION_IP_ADDRESS

-------------------------------------------------- --------------------------------------------------

zdlras2n2.oralocal

zdlras2n1.oralocal




SQL>

SQL> update HOST set REPLICATION_IP_ADDRESS='200.254.255.23' where NODE_NAME = 'zdlras2n1.oralocal';




1 row updated.




SQL> update HOST set REPLICATION_IP_ADDRESS='200.254.255.24' where NODE_NAME = 'zdlras2n2.oralocal';




1 row updated.




SQL>

SQL> select node_name,replication_ip_address from host;




NODE_NAME                                          REPLICATION_IP_ADDRESS

-------------------------------------------------- --------------------------------------------------

zdlras2n2.oralocal                                 200.254.255.24

zdlras2n1.oralocal                                 200.254.255.23




SQL>

As you can see above, the REPLICATION_IP_ADDRESS for each node reflects the IP at the replication network.

BACKUP_IP_ADDRESS

This configuration defines with IP’s the ZDLRA allows ingesting backups. Since the replication basically is a ingest coming from other network, we need to allow it. To do that we change the table internal table rai_host. And, of course, we can commit if everything is OK.

############## Upstream ZDLRA

SQL> col ADMIN_IP_ADDRESS format a30

SQL> col BACKUP_IP_ADDRESS format a30

SQL> SELECT * FROM rai_host;




NODE_NAME                                          ADMIN_IP_ADDRESS               BACKUP_IP_ADDRESS              REPLICATION_IP_ADDRESS

-------------------------------------------------- ------------------------------ ------------------------------ --------------------------------------------------

zdlras1n1.oralocal                                 10.160.5.1                     10.160.5.2                     200.254.255.21

zdlras1n2.oralocal                                 10.160.5.3                     10.160.5.4                     200.254.255.22




SQL> UPDATE rai_host SET backup_ip_address='200.254.255.21,'|| backup_ip_address WHERE node_name = 'zdlras1n1.oralocal';




1 row updated.




SQL> UPDATE rai_host SET backup_ip_address='200.254.255.22,'|| backup_ip_address WHERE node_name = 'zdlras1n2.oralocal';




1 row updated.




SQL> SELECT * FROM rai_host;




NODE_NAME                                          ADMIN_IP_ADDRESS               BACKUP_IP_ADDRESS              REPLICATION_IP_ADDRESS

-------------------------------------------------- ------------------------------ ------------------------------ --------------------------------------------------

zdlras1n1.oralocal                                 10.160.5.1                     200.254.255.21,10.160.5.2      200.254.255.21

zdlras1n2.oralocal                                 10.160.5.3                     200.254.255.22,10.160.5.4      200.254.255.22




SQL> commit;




Commit complete.




SQL>




############## Downstream ZDLRA

SQL> col ADMIN_IP_ADDRESS format a30

SQL> col BACKUP_IP_ADDRESS format a30

SQL>

SQL> SELECT * FROM rai_host;




NODE_NAME                                          ADMIN_IP_ADDRESS               BACKUP_IP_ADDRESS              REPLICATION_IP_ADDRESS

-------------------------------------------------- ------------------------------ ------------------------------ --------------------------------------------------

zdlras2n2.oralocal                                 10.160.6.3                     10.160.6.4                     200.254.255.24

zdlras2n1.oralocal                                 10.160.6.1                     10.160.6.2                     200.254.255.23




SQL>

SQL> UPDATE rai_host SET backup_ip_address='200.254.255.23,'|| backup_ip_address WHERE node_name = 'zdlras2n1.oralocal';




1 row updated.




SQL> UPDATE rai_host SET backup_ip_address='200.254.255.24,'|| backup_ip_address WHERE node_name = 'zdlras2n2.oralocal';




1 row updated.




SQL> SELECT * FROM rai_host;




NODE_NAME                                          ADMIN_IP_ADDRESS               BACKUP_IP_ADDRESS              REPLICATION_IP_ADDRESS

-------------------------------------------------- ------------------------------ ------------------------------ --------------------------------------------------

zdlras2n2.oralocal                                 10.160.6.3                     200.254.255.24,10.160.6.4      200.254.255.24

zdlras2n1.oralocal                                 10.160.6.1                     200.254.255.23,10.160.6.2      200.254.255.23




SQL> commit;




Commit complete.




SQL>

After that, I recommend to reboot, ate least ZDLRA database to reload the configs. Again, remember that the values here (ip, networks, interfaces, and routes) will differ for you’re your environment. Use as a guide.

Replication

The replication for ZDLRA have some steps to be done. If we do this during the deployment is easier (since is done automatically by installer). But if we need to do after (because adding or preparing the environment) is not complicate, but we need to be aware of some details.

Be careful with a basic configuration like interface (bond or no), and ip’s. They will change in your scenario, but the most important is to check the route. The packet route needs to travel just inside the replication network because the ZDLRA configuration (at internal tables) expects/listens in just one ip.

And about ZDLRA internal change, they occur at internal tables. So, always review correctly the values to avoid errors (before the commit). After that, we can use the DBMS_RA.CREATE_REPLICATION_SERVER and create the replication server config, I will show this in another post.

ZDLRA, Replication

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

ZDLRA, Replication

The replication for ZDRLA works differently than a “normal” for Oracle Database that uses Data Guard (or even Golden Gate). The point is to replicate the ingested backup “as is” between ZDLRA’s and not datafile block replication. And, of course, it is completely different from tape clones.

ZDLRA replication is not just sent backup from one site to another, it is how to increase your protection and be part of the disaster recovery strategy. The replication does not occur just for “rman backups”, but also for archivelogs generated for Real-Time Redo. And adding, this is how you integrate ZDLRA at your MAA architecture that makes the difference and how you protect your environment and reach zero RPO. There are several points about replication, how it operates, modes, and integration for Oracle MAA universe. I will discuss some points here in this post.

The architecture

The architecture for ZDLRA replication it is simple. There are two important definitions:

Upstream: It is the ZDLRA that receives the backup and forward it to another ZDLRA
Downstream: Is the ZDLRA that receives the backup from another ZDLRA

Basically it is this:

And the configuration can be:

One-Way: The data flows in one way only, only one ZDLRA forwards the backups.

Bi-Directional: Both ZDLRA’s send backups to each other. Is this case, the protected databases for each ZDLRA (usually one at the separated datacenter) are replicated between them since both operated as upstream/downstream.

Hub-Spoke: One ZDLRA receives backups from several ZDLRA’s. And this “third” ZDLRA is responsible to archive to tape.

Is more or less like the picture below:

One important detail is that every ZDLRA can clone backups to tape. Is not just for Hub-Spoke design.

For network connections, it is possible to dedicate one network port just for replication to avoid concurrent usage. But is also possible to share the same physical interface to receive backup ingest too. Whatever the chosen mode, the network for replication is a different subnet.

Replication and Index

I already wrote about some details from the internals of ZDLRA, the Virtual Full Backup, what you need to understand it, and also about INDEX_BACKUP (here and here). And if you check the posts, you already understood that ZDLRA “sees” the rman backup in a different way.

But for replication, some details are important to hint here. The replication is done for (and just for) every backupset that is ingested, so, the virtual full backup is not replication. On the other hand, every downstream (ZDLRA that receives backup) constructs the virtual full backup.

This is important for several reasons, but doing in this way the replication occurs as soon as possible. To understand you need to join several features. One of common usage for ZDLRA is to reduce the backups loads doing just incremental (and usually at big environments with several TB’s), so, if ZDLRA waits to finish the virtual full backups generation to replicate this can take some time (depending the size of datafile – TB’s – can be hours). And if you wait for the generation of virtual full backup the data transfer/replicated can be huge, instead replicated the incremental (that can be just some GB’s), it will replicate TB’s of the full? And another point is the unprotect window that you can add over the environment, unprotect because you will have the backup in just on side for some time frame (and in case of a disaster your backups can be lost). And besides all of that, because of the replication, you will have both in both sides the incremental and the full validated against errors.

The usage

The replication for ZDLRA is more than just sending backups from one side to another. Again, ZDLRA is more than just reduce backup load, it is a pillar of your architecture. It is a key part of MAA architecture.

Who deploys ZDLRA usually have a big environment and need to protect several databases and several sites, and usually already follow MAA practices. But the point is that ZDLRA can be used to protected all databases, from the single database to the multi-site database. Wirth ZDLRA replication, the ZDLRA for each site can protect their site databases, but also replicate single databases to improve the disaster recovery strategy. And if you add the Tape Clones, the protection is complete.

Think in the example above, we have two different sites. Some databases at the left side (Site A) are already protected by Oracle DG (but think that we can have the same from the right (Site B) replicating to the left too – Site A), and other databases without DG are protected by ZDLRA replication. What we have in this scenario:

For DG protected databases: Backup is not replicated between ZDLRA’s because each side has its own backup. Replication is done by DG.
Other databases: Backups replicated between ZDLRA’s (remember that can be multi-directional replication).

As you can see ZDLRA adds a new layer of protection/security for your environment and can be used to protect every kind of database. These kinds of architectures are shown in some details at Maximum Availability Architecture (MAA) – On-Premises HA Reference Architectures 2019, and here in this previous version of the same doc. I already made a Webinar discussing this too. And for Multi-site protection with ZDLRA, I already posted about it too, and you can read here. But whatever the mode of replication, you can reach ZERO RPO to all databases (even in case of site disaster) because you have the backups/archivelog needed to restore replication in other site (by DG or ZDLRA replication).

So, the usage of ZDLRA needs to be integrated into your architecture. From Bronze to Platinum databases it can be used.

There are several of others details to cover about replication at ZDLRA. This post was just a little introduction about the ZDLRA native replication. The idea was pointing some points about how more complex that just send backup from one site to another it is.

Adding more than one ZDLRA at your environment improve the strategy for MAA in several points. But the important is think in the architecture of your environment, understand the features of ZDLRA and how you can use it to reduce the single point of failure and improve the disaster recovery strategy. And with replication for ZDLRA (and multiple ZDLRA’s) the zero RPO can be from the single database, to the multi-site data guard database.

In the next posts I will show how to configure replicated ZDLRA, how the replication, what change for policies, and others details.

References:

Fast-Start Failover, Observe-Only Mode and Health Conditions

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

Fast-Start Failover, Observe-Only Mode and Health Conditions

Oracle Data Guard Broker allows the database administrators to automate some tasks and an easy way to configure properly a lot of features and details for data guard environments. The Fast-Start FailOver (FSFO) allows the broker to automatically failover to standby database in case of failure of the primary. But until 19c the only option is always to trigger the failover. This changed at 19c with a nice new feature that allows us to put FSFO in Observe-Only Mode.

In this post, I will focus just on new features for FSFO like Observe-Only Mode and Health Conditions for it. Lag and other details will not be covered here.

Observe-Only Mode

The Observe-Only Mode is a simple change that allows putting the FSFO to just observing/monitoring the DG environment, but in case of failure, it does not change the roles between primary and standby. Simple like that. As the Broker documentation for Observe-Only Mode says:

The observe-only mode enables you to test the impact of using fast-start failover in your configuration, without making any actual changes to the configuration.

Mode details can be checked in this link at documentation too. But FSFO is that:

Enable Observe-Only

So, to enable it is very simple, just need to call “ENABLE FAST_START FAILOVER OBSERVE ONLY”:

DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;

Enabled in Observe-Only Mode.

DGMGRL>

And at drc* trace file at primary side we can see:

2020-06-11T23:45:19.329+02:00

ENABLE FAST_START FAILOVER OBSERVE ONLY

FSFO SetState(st=47 "ENABLE OBONLY", fl=0x0 "", ob=0x2b621d39, tgt=2, v=0)

Setup log_archive_dest_n of GROUP=0 PRIORITY=0 with 'golds19c' as FSFO target

Fast-Start Failover (FSFO) has been enabled under observe-only mode between:

  Primary = "gold19c"

  Standby = "golds19c"

2020-06-11T23:45:20.527+02:00

ENABLE FAST_START FAILOVER OBSERVE ONLY completed successfully

And the result it is FSFO at Observe-Only Mode

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

And after we force the shutdown of the database, we can see that the roles not changed.

[oracle@goldpn1 ~]$ srvctl stop database -d gold19c -o abort

[oracle@goldpn1 ~]$

At Observer log file we can see some information that the error with primary was detected but nothing is done since it is in observe mode:

…

Unable to connect to database using gold19c

[W000 2020-06-12T00:13:22.248+02:00] Primary database cannot be reached.

[W000 2020-06-12T00:13:22.248+02:00] Fast-Start Failover threshold has expired.

[W000 2020-06-12T00:13:22.248+02:00] Try to connect to the standby.

[W000 2020-06-12T00:13:22.248+02:00] Making a last connection attempt to primary database before proceeding with Fast-Start Failover.

[W000 2020-06-12T00:13:22.248+02:00] Check if the standby is ready for failover.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor




Unable to connect to database using gold19c

[W000 2020-06-12T00:13:22.261+02:00] A fast-start failover would have been initiated...

[W000 2020-06-12T00:13:22.261+02:00] Unable to failover since this observer is in observe-only mode

[W000 2020-06-12T00:13:22.261+02:00] Fast-Start Failover is not possible because observe-only mode.

[W000 2020-06-12T00:13:22.261+02:00] Try to connect to the primary.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor




Unable to connect to database using gold19c

[W000 2020-06-12T00:13:22.269+02:00] Primary database cannot be reached.

[W000 2020-06-12T00:13:22.269+02:00] Fast-Start Failover observe-only mode enabled.

[W000 2020-06-12T00:13:22.269+02:00] Will not attempt a Fast-Start Failover.

[W000 2020-06-12T00:13:22.269+02:00] Retry connecting to primary.

[W000 2020-06-12T00:13:23.270+02:00] Try to connect to the primary.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor




Unable to connect to database using gold19c

[W000 2020-06-12T00:13:23.277+02:00] Primary database cannot be reached.

[W000 2020-06-12T00:13:24.278+02:00] Try to connect to the primary.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

…

And at drc* trace file at standby side we can see:

2020-06-12T00:13:21.103+02:00

Fast-Start Failover cannot proceed because: "observe-only mode"

Until now, this means that error with primary was detected, logged at logs, but no action was taken. The roles continue the same. The show report confirm this too:

DGMGRL> show configuration verbose;




Configuration - gold19c




  Protection Mode: MaxAvailability

  Members:

  gold19c  - Primary database

    golds19c - (*) Physical standby database




  (*) Fast-Start Failover target




  Properties:

    FastStartFailoverThreshold      = '30'

    OperationTimeout                = '30'

    TraceLevel                      = 'USER'

    FastStartFailoverLagLimit       = '0'

    CommunicationTimeout            = '180'

    ObserverReconnect               = '0'

    FastStartFailoverAutoReinstate  = 'TRUE'

    FastStartFailoverPmyShutdown    = 'TRUE'

    BystandersFollowRoleChange      = 'ALL'

    ObserverOverride                = 'FALSE'

    ExternalDestination1            = ''

    ExternalDestination2            = ''

    PrimaryLostWriteAction          = 'CONTINUE'

    ConfigurationWideServiceName    = 'gold19c_CFG'




Fast-Start Failover: Enabled in Observe-Only Mode

  Lag Limit:          0 seconds

  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configuration Status:

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

ORA-16625: cannot reach member "gold19c"

DGM-17017: unable to determine configuration status




DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

In some scenarios, this can be good because it allows us to fix the problem and not pass from a failover event (manual reinstate and so on in some cases). Another option is to use the Observe-Only mode to do what the name says, just observer. Think in one environment that you want to test some conditions and the health of the environment (network and others) before you really enable the FSFO.

So, if the primary database returns, the FSFO returns normally:

[oracle@goldpn1 ~]$ srvctl start database -d gold19c

[oracle@goldpn1 ~]$

At drc* file for standby:

2020-06-12T00:16:52.837+02:00

Primary connected to this instance.

2020-06-12T00:17:00.186+02:00

FSFO SetState(st=2 "UNSYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=11)

2020-06-12T00:17:06.951+02:00

FSFO SetState(st=1 "SYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=12)

At Broker:

DGMGRL> show configuration;




Configuration - gold19c




  Protection Mode: MaxAvailability

  Members:

  gold19c  - Primary database

    golds19c - (*) Physical standby database




Fast-Start Failover: Enabled in Observe-Only Mode




Configuration Status:

SUCCESS   (status updated 51 seconds ago)




DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

Upgrade and Downgrade modes

If the FSFO is operating in Observer-Only ode it is impossible to “upgrade” it to normal mode:

DGMGRL>  ENABLE FAST_START FAILOVER

Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.




Failed.

DGMGRL>

To do that, we need to disable the FSFO and enable it in normal mode:

DGMGRL>  ENABLE FAST_START FAILOVER

Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.




Failed.

DGMGRL>

DGMGRL> DISABLE FAST_START FAILOVER ;

Disabled.

DGMGRL> ENABLE FAST_START FAILOVER ;

Enabled in Zero Data Loss Mode.

DGMGRL>

To downgrade is the same, we can’t downgrade directly, need to disable and change to Observer-Only mode:

DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;

Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.




Failed.

DGMGRL>

Health Conditions

This is not a new feature for 19c, but help to reduce the scenarios where FSFO is triggered. It is possible to control the Health Conditions and disable/enable some errors like corrupted controlfiles or stuck archive. All options can be checked here at the documentation.

Look below at “Configurable Failover Conditions”, everything there can be set:

DGMGRL>  show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

Some examples

DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";

Succeeded.

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile           YES

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>




DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Corrupted Dictionary";

Succeeded.

DGMGRL> DISABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";

Succeeded.

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

Another option is to enable (or disable) for special condition/error from controlfile. The error ORA-240 can be set at trigger option:

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 240;

Succeeded.

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    ORA-240: control file enqueue held for more than %s seconds




DGMGRL> DISABLE FAST_START FAILOVER CONDITION 240;

Succeeded.

DGMGRL>

But just for ORA-240, other errors are not yet enabled:

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 600;

Error: ORA-16524: unsupported command, option, or argument




Failed.

DGMGRL>

Observe-Only and Conditions

The new feature Observe-Only mode for 19c is a good feature because it allows more control where and when the FSFO is triggered. Until this, the only option was ON or OFF. And in scenarios that you want to test, or even validate the environment before enable (for real) was impossible.

And if we combine this with Heath Condition check, it is a powerful control for the DG environment. It allows a better tune.

ZDLRA, ORDERING_WAIT task state

Category: Engineer System Author: Fernando Simon (Board Member) Date: 5 years ago Comments: 0

ZDLRA, ORDERING_WAIT task state

Tasks for ZDLRA are the pillar of how the backups are processed, everything is one task. So, when you ingest incremental backup one task is created but can occur that it get a freeze at ORDERING_WAIT state. These tasks are hard to identify and can create a big problem for your virtual full backup and backup strategy. Below I will show how they occur and how to solve the problem.

Incremental backups

To understand how they appear I need to show a little how the incremental backups work. Basically, look at the example below and one datafile with some blocks. When you do backup level 0, all the blocks are copied, and if I do backup incremental level 1 just the “new blocks” are copied (just blocks that changed).

As you can see above, the first backup of datafile (level 0), have SCN 3333. So, the consequent backup level 1 will copy everything that changed from that (and in this example have SCN 4444). And, the next incremental backup will pick up everything since the last backup, in this example, every block change since SCN 4444. So, it will generate a new backup with/until SCN 5555.

As you know, this is the definition of incremental backup. As you can see in the definition at “About RMAN Incremental Backups” from docs it is:

“An incremental backup copies only those data blocks that have changed since a previous backup. You can use RMAN to create incremental backups of data files, tablespaces, or the whole database.”

But the point it is that database/rman knows the scn from the last backup, and when does incremental backup it copy everything since from the last scn. Each incremental backup “knows” (internally, with the database blocks that are inside) the start scn and endpoint scn. So, to “reconstruct” the datafile, the database/rman uses the full backup and all subsequent incremental backups.

Incremental backup and ZDLRA

As I already wrote about incremental backups and ZDLRA, they are used to construct the “Virtual Full Backup”. In a very resumed way, ZDLRA merge the stored backups and create the virtual full backup (as I explained here too).

But even with this virtual backup, the way how the incremental backups work not change. The procedure is the same, check the scn from the last backup and copy all block change since that. As you can see in the image above, the full backups (blue in the image) are created merging the previous full with the ingested backup and are used as the base for the subsequent incremental.

ORDERING_WAIT

The ORDERING_WAIT occurs when the task INDEX_BACKUP that creates the index (and the virtual full backup) can’t finish because it doesn’t have all the required data. And this occurs because (by some reason) one backup is created and not stored at ZDLRA. And can be even a duplicate to create the standby (remember that basically the duplicate is rman backup copied from one side to another).

Look the image above, when after the SCN 4444 (that was the last backup stored at ZDLRA), another backup was taken and it is not inside of ZDLRA. So, when the new incremental backup is taken, it will copy all blocks changed from the last backup, but this last backup was the one that is not at ZDLRA (for rman side, it don’t care where it is the backup. By definition, incremental backup it is from the last backup, whatever or wherever it is).

And when this incremental backup it is ingested at ZDLRA, it will try to create the virtual full. But since the last stored backup have the SCN 4444, and the new incremental pickup blocks changed since SCN 5555 and go until the SCN 6666, ZDLRA knows that it is a gap when opens this ingested backup. ZDLRA doesn’t have the blocks that are between SCN 4444 and 5555 (look the yellow block, backup exists just outside of ZDLRA).

So, it is impossible to create the virtual full backup for SCN 6666, and the INDEX_BACKUP task will be at hold in state ORDERING_WAIT. To solve, there is two option, you can take a new level 0 backup or use BACKUP [CUMULATIVE] INCREMENTAL LEVEL 1 … FOR RECOVER OF TAG ‘<TAG>’ command. I will show you below how to do that.

How this occurs and how to solve

Bellow, I will show how you can identify and solve the problem. I will use the solution wrote in the last paragraph. But you can check the internal details of how occurs, to identify, and how to solve the issue.

In this scenario, I will use one database (number 12 – tablespace users for one PDB). So, first, check the backups for datafile 12:

RMAN> list backup of datafile 12 completed after "sysdate - 20/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24283   Incr 1  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24284   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282I   Media:

  List of Datafiles in backup set 24283

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24287   Incr 0  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24288   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282_12   Media:

  List of Datafiles in backup set 24287

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

As you can see, we have Incr 1 and Incr 0 levels with the same scn. And the name of the handle starts with VB$. This means that virtual full backup it is OK and created by ZDLRA.

And if I do a new incremental backup, ZDLRA generates a new virtual full backup:



RMAN> BACKUP INCREMENTAL LEVEL 1 DEVICE TYPE SBT FILESPERSET 1 DATAFILE 12;




Starting backup at 20/04/2020 23:51:58

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 20/04/2020 23:51:59

channel ORA_SBT_TAPE_1: finished piece 1 at 20/04/2020 23:52:02

piece handle=ORCL18C_a1uu5dsv_1_1 tag=TAG20200420T235158 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:03

Finished backup at 20/04/2020 23:52:02




Starting Control File and SPFILE Autobackup at 20/04/2020 23:52:02

piece handle=c-558466555-20200420-0b comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 20/04/2020 23:52:10




RMAN> list backup of datafile 12 completed after "sysdate - 20/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24283   Incr 1  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24284   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282I   Media:

  List of Datafiles in backup set 24283

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24287   Incr 0  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24288   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282_12   Media:

  List of Datafiles in backup set 24287

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24355   Incr 1  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24356   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354I   Media:

  List of Datafiles in backup set 24355

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24359   Incr 0  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24360   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354_12   Media:

  List of Datafiles in backup set 24359

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

Simulating the error

Just to remember that this new incremental have all scn from 2013573 until 2013810.

But by some reason one full backup it takes to another place (like disk, or duplicate for standby). Look that the channel type it is not ZDLRA:

RMAN> BACKUP INCREMENTAL LEVEL 0 DEVICE TYPE disk format '/tmp/%U' DATAFILE 12 TAG 'BKP-DBF-TO-DISK';




Starting backup at 20/04/2020 23:54:01

released channel: ORA_SBT_TAPE_1

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=65 device type=DISK

channel ORA_DISK_1: starting incremental level 0 datafile backup set

channel ORA_DISK_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_DISK_1: starting piece 1 at 20/04/2020 23:54:01

channel ORA_DISK_1: finished piece 1 at 20/04/2020 23:54:02

piece handle=/tmp/a3uu5e0p_1_1 tag=BKP-DBF-TO-DISK comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01

Finished backup at 20/04/2020 23:54:02




Starting Control File and SPFILE Autobackup at 20/04/2020 23:54:02

piece handle=/u01/app/oracle/oradata/ORCL18C/autobackup/2020_04_20/o1_mf_s_1038268443_h9w6hwht_.bkp comment=NONE

Finished Control File and SPFILE Autobackup at 20/04/2020 23:54:10




RMAN> list backup tag = 'BKP-DBF-TO-DISK';







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24391   Incr 0  1.18M      DISK        00:00:00     20/04/2020 23:54:01

        BP Key: 24394   Status: AVAILABLE  Compressed: NO  Tag: BKP-DBF-TO-DISK

        Piece Name: /tmp/a3uu5e0p_1_1

  List of Datafiles in backup set 24391

  Container ID: 3, PDB Name: ORCL18P

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013972    20/04/2020 23:54:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

Now, if I try to execute the incremental level 1, the ingested backup does not generate a new virtual backup:

RMAN> BACKUP INCREMENTAL LEVEL 1 DEVICE TYPE SBT FILESPERSET 1 DATAFILE 12;




Starting backup at 20/04/2020 23:55:23

released channel: ORA_DISK_1

allocated channel: ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: SID=65 device type=SBT_TAPE

channel ORA_SBT_TAPE_1: RA Library (ZDLRAS1) SID=A3C0F4C16DAA11FAE053010310ACC1C4

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 20/04/2020 23:55:24

channel ORA_SBT_TAPE_1: finished piece 1 at 20/04/2020 23:55:27

piece handle=ORCL18C_a5uu5e3c_1_1 tag=TAG20200420T235524 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:03

Finished backup at 20/04/2020 23:55:27




Starting Control File and SPFILE Autobackup at 20/04/2020 23:55:27

piece handle=c-558466555-20200420-0d comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 20/04/2020 23:55:35




RMAN> list backup of datafile 12 completed after "sysdate - 20/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24283   Incr 1  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24284   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282I   Media:

  List of Datafiles in backup set 24283

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24287   Incr 0  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24288   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282_12   Media:

  List of Datafiles in backup set 24287

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24355   Incr 1  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24356   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354I   Media:

  List of Datafiles in backup set 24355

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24359   Incr 0  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24360   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354_12   Media:

  List of Datafiles in backup set 24359

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24391   Incr 0  1.18M      DISK        00:00:00     20/04/2020 23:54:01

        BP Key: 24394   Status: AVAILABLE  Compressed: NO  Tag: BKP-DBF-TO-DISK

        Piece Name: /tmp/a3uu5e0p_1_1

  List of Datafiles in backup set 24391

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013972    20/04/2020 23:54:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24430   Incr 1  256.00K    SBT_TAPE    00:00:00     20/04/2020 23:55:24

        BP Key: 24431   Status: AVAILABLE  Compressed: NO  Tag: TAG20200420T235524

        Handle: ORCL18C_a5uu5e3c_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24430

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2014089    20/04/2020 23:55:24              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

Look above that the last incremental backup not generated a new level 0, it is the backupset 24430 (and backuppiece 24431).

Task inside ZDLRA

If we enter inside of ZDLRA, we can see that the task responsible to index the backupiece 23286 is in state ORDEING_WAIT. You can use query over ra_task and check using the state column..

SQL> select TASK_ID, TASK_TYPE, STATE, WAITING_ON, DB_KEY, DB_UNIQUE_NAME, CREATION_TIME, ERROR_COUNT, INTERRUPT_COUNT, BP_KEY,BS_KEY,DF_KEY,VB_KEY from rasys.ra_task where db_unique_name = 'ORCL18C' and state = 'ORDERING_WAIT' order by 5,2,7,10,11,12,13;




   TASK_ID TASK_TYPE       STATE                     WAITING_ON     DB_KEY DB_UNIQUE_NAME                 CREATION_TIME                       ERROR_COUNT INTERRUPT_COUNT     BP_KEY     BS_KEY     DF_KEY     VB_KEY

---------- --------------- ------------------------- ---------- ---------- ------------------------------ ----------------------------------- ----------- --------------- ---------- ---------- ---------- ----------

     40203 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        20-APR-20 11.55.26.779191 PM +02:00           0               1      24431




SQL>

And there is no incident for this error or state:

SQL> select incident_id, error_text last_seen from ra_incident_log where db_unique_name = 'ORCL18C' and status not in ('FIXED', 'RESET') order by last_seen desc;




no rows selected




SQL>

Unfortunately, this occurs for ZDLRA, even if there is a task in the ORDEING_WAIT state, it is not reported as an error. And if you think about, the virtual full backup it is not generated and the feature that it is the virtual full backup is not in place for these datafiles.

But even with tasks in this state, you will not be unprotected, the backup is there and can be restored.

ZDLRA Internals

If we check inside of ZDLRA we can check more details (this is one example how to navigate through internal tables of ZDLRA rman catalog and find the information – same that you find with list backupset inside rman):

SQL> select bs_key, db_key, pdb_key from bp where bp_key = 24431;




    BS_KEY     DB_KEY    PDB_KEY

---------- ---------- ----------

     24430      15993      16000




SQL> select * from rc_database where db_key = 15993;




    DB_KEY  DBINC_KEY       DBID NAME     RESETLOGS_CHANGE# RESETLOGS FINAL_CHANGE#

---------- ---------- ---------- -------- ----------------- --------- -------------

     15993      15994  558466555 ORCL18C            1477662 11-AUG-19




SQL> select df_key from df where dbinc_key = 15994 and  file# = 12;




    DF_KEY

----------

     16026




SQL> select bdf_key, ckp_scn from bdf where bs_key = 24430;




   BDF_KEY    CKP_SCN

---------- ----------

     24432    2014089




SQL>

###################################################################

RMAN> list backupset 24430;







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24430   Incr 1  256.00K    SBT_TAPE    00:00:00     20/04/2020 23:55:24

        BP Key: 24431   Status: AVAILABLE  Compressed: NO  Tag: TAG20200420T235524

        Handle: ORCL18C_a5uu5e3c_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24430

  Container ID: 3, PDB Name: ORCL18P

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2014089    20/04/2020 23:55:24              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

Using the BP table we can find the DB_KEY, and with that, we can go to RC_DATABASE to get the incarnation key (DBINC_KEY). With that, at DF table we can discover the DF_KEY for the datafile 12 and for this database incarnation. And with the DBF table, we can find the CKP_SCN for this datafile.

But if we go to the BLOCKS tables (ZDLRA table that stores the index – and “more or less” the virtual full), there are no database blocks with the SCN 2014089 for the backupset 24430. If you want to understand who it works, you can read my post about this.

SQL> select * from blocks where df_key = 16026 and scn >= 2014089;




no rows selected




SQL>

And if we check for SCN 2013972 (there came from backup was put in disk), nothing too (as expected):

SQL> select * from blocks where df_key = 16026 and scn >= 2013972;




no rows selected




SQL>

As if we check with the last SCN 2013810 that are know by ZDLRA (last virtual full backup, backupset 24359), we can see which blocks are there:

SQL> select * from blocks where df_key = 16026 and scn >= 2013810 order by scn, chunkno ;




    DF_KEY    BLOCKNO        SCN     CKP_ID    CHUNKNO    COFFSET       USED  DBINC_KEY     ENDBLK

---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------

     16026          1    2013810    2013810      25601      33121        338      15994

     16026          0    2013810    2013810      25601       8192      24576      15994




SQL>

And there is just PLANS for this virtual backup (nothing generated for the next existing backups):

SQL> select VB_KEY, DF_KEY, CKP_SCN, SRCBP_KEY, VCBP_KEY from vbdf where CKP_SCN >= 2013810 and df_key = 16026;




    VB_KEY     DF_KEY    CKP_SCN  SRCBP_KEY   VCBP_KEY

---------- ---------- ---------- ---------- ----------

     24354      16026    2013810      24293      24356




SQL> select * from plans_details where VB_KEY = 24354;




    DF_KEY       TYPE     VB_KEY    BLKRANK    BLOCKNO    CHUNKNO    NUMBLKS    COFFSET   NUMBYTES

---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------

     16026          1      24354          1          0      25601          1       8192      24576

     16026          1      24354          1          2      16385          2      32788        256

     16026          1      24354          1          4      14337        132      33045      14000

     16026          1      24354          1        136      16385          3      33044        553

     16026          1      24354          1        139      23553          1      32788        327

     16026          1      24354          1        140      16385          2      33786        264

     16026          1      24354          1        142      25601          1      32788        333

     16026          1      24354          1        143      16385          1      34182        132

     16026          1      24354          1        191      14337          3      47045        252

     16026          1      24354          1 4294967295      25601          1      33121        338




10 rows selected.




SQL>

As you can figure out, ZDLRA can’t fill the gap to create the virtual full backup.

Recurring error

If you not solve the problem, and continue to ingest backups, the task will remain in ORDERING_WAIT:

RMAN> BACKUP INCREMENTAL LEVEL 1 DEVICE TYPE SBT FILESPERSET 1 DATAFILE 12;




Starting backup at 21/04/2020 00:04:00

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 21/04/2020 00:04:01

channel ORA_SBT_TAPE_1: finished piece 1 at 21/04/2020 00:04:04

piece handle=ORCL18C_a7uu5ejh_1_1 tag=TAG20200421T000400 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:03

Finished backup at 21/04/2020 00:04:04




Starting Control File and SPFILE Autobackup at 21/04/2020 00:04:04

piece handle=c-558466555-20200421-00 comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 21/04/2020 00:04:13




RMAN> list backup of datafile 12 completed after "sysdate - 5/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24511   Incr 1  256.00K    SBT_TAPE    00:00:01     21/04/2020 00:04:02

        BP Key: 24512   Status: AVAILABLE  Compressed: NO  Tag: TAG20200421T000400

        Handle: ORCL18C_a7uu5ejh_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24511

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2021757    21/04/2020 00:04:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

####################################

SQL> select TASK_ID, TASK_TYPE, STATE, WAITING_ON, DB_KEY, DB_UNIQUE_NAME, CREATION_TIME, ERROR_COUNT, INTERRUPT_COUNT, BP_KEY,BS_KEY,DF_KEY,VB_KEY from rasys.ra_task where db_unique_name = 'ORCL18C' and state = 'ORDERING_WAIT' order by 5,2,7,10,11,12,13;




   TASK_ID TASK_TYPE       STATE                     WAITING_ON     DB_KEY DB_UNIQUE_NAME                 CREATION_TIME                       ERROR_COUNT INTERRUPT_COUNT     BP_KEY     BS_KEY     DF_KEY     VB_KEY

---------- --------------- ------------------------- ---------- ---------- ------------------------------ ----------------------------------- ----------- --------------- ---------- ---------- ---------- ----------

     40203 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        20-APR-20 11.55.26.779191 PM +02:00           0               1      24431

     40210 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        21-APR-20 12.04.05.207342 AM +02:00           0               1      24512




SQL>

Even if you try to do a cumulative incremental backup, the problem will be the same:

RMAN> BACKUP CUMULATIVE INCREMENTAL LEVEL 1 DEVICE TYPE SBT FILESPERSET 1 DATAFILE 12;




Starting backup at 21/04/2020 00:07:13

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 21/04/2020 00:07:13

channel ORA_SBT_TAPE_1: finished piece 1 at 21/04/2020 00:07:20

piece handle=ORCL18C_a9uu5eph_1_1 tag=TAG20200421T000713 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:07

Finished backup at 21/04/2020 00:07:20




Starting Control File and SPFILE Autobackup at 21/04/2020 00:07:20

piece handle=c-558466555-20200421-01 comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 21/04/2020 00:07:28




RMAN> list backup of datafile 12 completed after "sysdate - 2/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24604   Incr 1  256.00K    SBT_TAPE    00:00:02     21/04/2020 00:07:15

        BP Key: 24605   Status: AVAILABLE  Compressed: NO  Tag: TAG20200421T000713

        Handle: ORCL18C_a9uu5eph_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24604

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2021959    21/04/2020 00:07:13              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

#############################################

SQL> select TASK_ID, TASK_TYPE, STATE, WAITING_ON, DB_KEY, DB_UNIQUE_NAME, CREATION_TIME, ERROR_COUNT, INTERRUPT_COUNT, BP_KEY,BS_KEY,DF_KEY,VB_KEY from rasys.ra_task where db_unique_name = 'ORCL18C' and state = 'ORDERING_WAIT' order by 5,2,7,10,11,12,13;




   TASK_ID TASK_TYPE       STATE                     WAITING_ON     DB_KEY DB_UNIQUE_NAME                 CREATION_TIME                       ERROR_COUNT INTERRUPT_COUNT     BP_KEY     BS_KEY     DF_KEY     VB_KEY

---------- --------------- ------------------------- ---------- ---------- ------------------------------ ----------------------------------- ----------- --------------- ---------- ---------- ---------- ----------

     40203 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        20-APR-20 11.55.26.779191 PM +02:00           0               1      24431

     40210 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        21-APR-20 12.04.05.207342 AM +02:00           0               1      24512

     40215 INDEX_BACKUP    ORDERING_WAIT                             15993 ORCL18C                        21-APR-20 12.07.18.140618 AM +02:00           0               1      24605




SQL>

Solving ORDERING_WAIT

There are two ways to solve the issue, for both the idea is the same: ingest the database blocks inside of ZDLRA. And we do this performing backup.

FOR RECOVER OF TAG

The first is to use the command “BACKUP [CUMULATIVE] INCREMENTAL LEVEL 1 … FOR RECOVER OF TAG ‘<TAG>’”. The idea here is to create one incremental backup that recovers since one specific tag. You can check the documentation for more details if you want, it exists since Oracle 10g.

As you can imagine, the critical point here is to define the correct TAG do be used as a reference. And in this case, the tag is the last full backup (virtual or no) that it is inside of ZDLRA. Doing this, we ingest all changed blocks and fill the gap that is holding the task.

In this case, I used normal incremental. Look that the tag is from the last virtual full backup that is inside of ZDLRA for this datafile:

RMAN> BACKUP INCREMENTAL LEVEL 1 DEVICE TYPE SBT FOR RECOVER OF TAG 'TAG20200420T235158' DATAFILE 12;




Starting backup at 21/04/2020 00:12:17

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 21/04/2020 00:12:17

channel ORA_SBT_TAPE_1: finished piece 1 at 21/04/2020 00:12:20

piece handle=ORCL18C_abuu5f31_1_1 tag=TAG20200420T235158 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:03

Finished backup at 21/04/2020 00:12:20




Starting Control File and SPFILE Autobackup at 21/04/2020 00:12:20

piece handle=c-558466555-20200421-02 comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 21/04/2020 00:12:28




RMAN>

And we can see that a new virtual full backup was created with this incremental backup (look the last 5 backupsets):

RMAN> list backup of datafile 12 completed after "sysdate - 40/1440";







List of Backup Sets

===================







BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24283   Incr 1  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24284   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282I   Media:

  List of Datafiles in backup set 24283

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24287   Incr 0  40.00K     SBT_TAPE    00:00:04     20/04/2020 23:49:26

        BP Key: 24288   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T234921

        Handle: VB$_1891149551_24282_12   Media:

  List of Datafiles in backup set 24287

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013573    20/04/2020 23:49:22              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24355   Incr 1  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24356   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354I   Media:

  List of Datafiles in backup set 24355

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24359   Incr 0  40.00K     SBT_TAPE    00:00:03     20/04/2020 23:52:02

        BP Key: 24360   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24354_12   Media:

  List of Datafiles in backup set 24359

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013810    20/04/2020 23:51:59              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24391   Incr 0  1.18M      DISK        00:00:00     20/04/2020 23:54:01

        BP Key: 24394   Status: AVAILABLE  Compressed: NO  Tag: BKP-DBF-TO-DISK

        Piece Name: /tmp/a3uu5e0p_1_1

  List of Datafiles in backup set 24391

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2013972    20/04/2020 23:54:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24430   Incr 1  256.00K    SBT_TAPE    00:00:00     20/04/2020 23:55:24

        BP Key: 24431   Status: AVAILABLE  Compressed: NO  Tag: TAG20200420T235524

        Handle: ORCL18C_a5uu5e3c_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24430

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2014089    20/04/2020 23:55:24              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24511   Incr 1  256.00K    SBT_TAPE    00:00:01     21/04/2020 00:04:02

        BP Key: 24512   Status: AVAILABLE  Compressed: NO  Tag: TAG20200421T000400

        Handle: ORCL18C_a7uu5ejh_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24511

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2021757    21/04/2020 00:04:01              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24604   Incr 1  256.00K    SBT_TAPE    00:00:02     21/04/2020 00:07:15

        BP Key: 24605   Status: AVAILABLE  Compressed: NO  Tag: TAG20200421T000713

        Handle: ORCL18C_a9uu5eph_1_1   Media: Recovery Appliance (ZDLRAS1)

  List of Datafiles in backup set 24604

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2021959    21/04/2020 00:07:13              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24814   Incr 1  40.00K     SBT_TAPE    00:00:03     21/04/2020 00:12:20

        BP Key: 24815   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24813I   Media:

  List of Datafiles in backup set 24814

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   1  Incr 2025613    21/04/2020 00:12:17              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




BS Key  Type LV Size       Device Type Elapsed Time Completion Time

------- ---- -- ---------- ----------- ------------ -------------------

24818   Incr 0  40.00K     SBT_TAPE    00:00:03     21/04/2020 00:12:20

        BP Key: 24819   Status: AVAILABLE  Compressed: YES  Tag: TAG20200420T235158

        Handle: VB$_1891149551_24813_12   Media:

  List of Datafiles in backup set 24818

  File LV Type Ckp SCN    Ckp Time            Abs Fuz SCN Sparse Name

  ---- -- ---- ---------- ------------------- ----------- ------ ----

  12   0  Incr 2025613    21/04/2020 00:12:17              NO    /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf




RMAN>

And inside of ZDLRA, we can see that task previously in ORDERING_WAIT finished (check the BP_KEY column):

SQL> select TASK_ID, TASK_TYPE, STATE, WAITING_ON, DB_KEY, DB_UNIQUE_NAME, CREATION_TIME, COMPLETION_TIME, ERROR_COUNT, INTERRUPT_COUNT, BP_KEY,BS_KEY,DF_KEY,VB_KEY from rasys.ra_task where task_id IN (40203,40210,40215);




   TASK_ID TASK_TYPE       STATE                     WAITING_ON     DB_KEY DB_UNIQUE_NAME                 CREATION_TIME                       COMPLETION_TIME                     ERROR_COUNT INTERRUPT_COUNT     BP_KEY     BS_KEY     DF_KEY     VB_KEY

---------- --------------- ------------------------- ---------- ---------- ------------------------------ ----------------------------------- ----------------------------------- ----------- --------------- ---------- ---------- ---------- ----------

     40203 INDEX_BACKUP    COMPLETED                                 15993 ORCL18C                        20-APR-20 11.55.26.779191 PM +02:00 21-APR-20 12.13.09.253916 AM +02:00           0               1      24431

     40210 INDEX_BACKUP    COMPLETED                                 15993 ORCL18C                        21-APR-20 12.04.05.207342 AM +02:00 21-APR-20 12.13.21.663869 AM +02:00           0               1      24512

     40215 INDEX_BACKUP    COMPLETED                                 15993 ORCL18C                        21-APR-20 12.07.18.140618 AM +02:00 21-APR-20 12.13.42.179087 AM +02:00           0               1      24605




SQL>

And if we check for the BLOCKS for this datafile, we can see that was registered new that are higher with the last full backup before the error, and they go until the last backup made

SQL> select * from blocks where df_key = 16026 and scn >= 2013810 order by scn, chunkno ;




    DF_KEY    BLOCKNO        SCN     CKP_ID    CHUNKNO    COFFSET       USED  DBINC_KEY     ENDBLK

---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------

     16026          0    2013810    2013810      25601       8192      24576      15994

     16026          1    2013810    2013810      25601      33121        338      15994

     16026        142    2021942    2025613      26625      32788        414      15994

     16026          1    2025613    2025613      26625      33202        336      15994

     16026          0    2025613    2025613      26625       8192      24576      15994




SQL>

And we can see that now exists PLANS for the backupset of virtual full backup that exists before the error (SCN 2013810) and after we fix (SCN 2025613):

SQL> select VB_KEY, DF_KEY, CKP_SCN, SRCBP_KEY, VCBP_KEY from vbdf where CKP_SCN >= 2013810 and DF_KEY = 16026;




    VB_KEY     DF_KEY    CKP_SCN  SRCBP_KEY   VCBP_KEY

---------- ---------- ---------- ---------- ----------

     24354      16026    2013810      24293      24356

     24813      16026    2025613      24706      24815




SQL>

SQL> select * from plans_details where VB_KEY IN (24354,24813) order by VB_KEY,BLOCKNO;




    DF_KEY       TYPE     VB_KEY    BLKRANK    BLOCKNO    CHUNKNO    NUMBLKS    COFFSET   NUMBYTES

---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------

     16026          1      24354          1          0      25601          1       8192      24576

     16026          1      24354          1          2      16385          2      32788        256

     16026          1      24354          1          4      14337        132      33045      14000

     16026          1      24354          1        136      16385          3      33044        553

     16026          1      24354          1        139      23553          1      32788        327

     16026          1      24354          1        140      16385          2      33786        264

     16026          1      24354          1        142      25601          1      32788        333

     16026          1      24354          1        143      16385          1      34182        132

     16026          1      24354          1        191      14337          3      47045        252

     16026          1      24354          1 4294967295      25601          1      33121        338

     16026          1      24813          1          0      26625          1       8192      24576

     16026          1      24813          1          2      16385          2      32788        256

     16026          1      24813          1          4      14337        132      33045      14000

     16026          1      24813          1        136      16385          3      33044        553

     16026          1      24813          1        139      23553          1      32788        327

     16026          1      24813          1        140      16385          2      33786        264

     16026          1      24813          1        142      26625          1      32788        414

     16026          1      24813          1        143      16385          1      34182        132

     16026          1      24813          1        191      14337          3      47045        252

     16026          1      24813          1 4294967295      26625          1      33202        336




20 rows selected.




SQL>

So, this means that the incremental backup that we made with FOR RECOVERY OF TAG was ingested and used to fix the needed gap.

And if we try to recover the datafile 12, we can do without a problem. Check that the used backup to recover was the last virtual full backup generated from the “RECOVERY OF TAG” command:

RMAN> run{

2> ALTER PLUGGABLE DATABASE ORCL18P CLOSE IMMEDIATE INSTANCES=ALL;

3> RESTORE DATAFILE 12;

4> RECOVER DATAFILE 12;

5> ALTER PLUGGABLE DATABASE ORCL18P OPEN INSTANCES=ALL;

6> }




Statement processed

starting full resync of recovery catalog

full resync complete




Starting restore at 21/04/2020 00:57:53

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=88 device type=DISK

allocated channel: ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: SID=70 device type=SBT_TAPE

channel ORA_SBT_TAPE_1: RA Library (ZDLRAS1) SID=A3C1D44670B428B0E053010310AC5DA9




channel ORA_SBT_TAPE_1: starting datafile backup set restore

channel ORA_SBT_TAPE_1: specifying datafile(s) to restore from backup set

channel ORA_SBT_TAPE_1: restoring datafile 00012 to /u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: reading from backup piece VB$_1891149551_24813_12

channel ORA_SBT_TAPE_1: piece handle=VB$_1891149551_24813_12 tag=TAG20200420T235158

channel ORA_SBT_TAPE_1: restored backup piece 1

channel ORA_SBT_TAPE_1: restore complete, elapsed time: 00:00:25

Finished restore at 21/04/2020 00:58:21




Starting recover at 21/04/2020 00:58:21

using channel ORA_DISK_1

using channel ORA_SBT_TAPE_1




starting media recovery

media recovery complete, elapsed time: 00:00:00




Finished recover at 21/04/2020 00:58:22




Statement processed

starting full resync of recovery catalog

full resync complete




RMAN>

Unfortunately, backup tags for ZDLRA can be tricky when you directly specify it during the backup phase. They can be the same and the usage “FOR TAG” can be more difficult to define. One option is to merge and execute the command BACKUP CUMULATIVE INCREMENTAL LEVEL 1 DEVICE TYPE SBT FOR RECOVER OF TAG ‘<TAG>’ DATAFILE XX.

Doing this, the command will pick up all the blocks from the last full backup that have the tag that you defined. The result is the same:

RMAN> BACKUP CUMULATIVE INCREMENTAL LEVEL 1 DEVICE TYPE SBT FOR RECOVER OF TAG 'TAG20200419T232006' DATAFILE 12;




Starting backup at 19/04/2020 23:38:44

using channel ORA_SBT_TAPE_1

channel ORA_SBT_TAPE_1: starting incremental level 1 datafile backup set

channel ORA_SBT_TAPE_1: specifying datafile(s) in backup set

input datafile file number=00012 name=/u01/app/oracle/oradata/ORCL18C/ORCL18P/users01.dbf

channel ORA_SBT_TAPE_1: starting piece 1 at 19/04/2020 23:38:44

channel ORA_SBT_TAPE_1: finished piece 1 at 19/04/2020 23:38:59

piece handle=ORCL18C_97uu2oo4_1_1 tag=TAG20200419T232006 comment=API Version 2.0,MMS Version 12.2.0.2

channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:00:15

Finished backup at 19/04/2020 23:38:59




Starting Control File and SPFILE Autobackup at 19/04/2020 23:39:00

piece handle=c-558466555-20200419-04 comment=API Version 2.0,MMS Version 12.2.0.2

Finished Control File and SPFILE Autobackup at 19/04/2020 23:39:16




RMAN>

BACKUP FULL

The other option to solve the ORDERING_WAIT is to do a full backup of the datafile. With this, all the blocks are read and ingested at ZDLRA.

The procedure is the same as above and the result the same. The only point is that for huge files this can take a long time (since it is full) and the above incremental approach can be more suitable.

Monitoring the Tasks

So, as you can see above, the ORDERING_WAIT state can have a lot of collateral effects for you ZDLRA. Unfortunately, this not generates incidents that are reported, you need to write a query to check this directly at ra_task table.

Whatever the method that you choose to solve the problem (RECOVERY OF TAG, or FULL BACKUP) always verify if the new virtual full backup is generated. It is a good practice to do this to avoid errors and do a double cross-check over the tasks

It is a simple query, and a simple monitoring thing to do. But this will avoid a huge problem.

Reference:

Implementing a Dual Backup Strategy with Backups to Tape and Recovery Appliance (Doc ID 2154471.1)

DB_UNIQUE_NAME, PDB, and Data Guard

Category: Database Author: Fernando Simon (Board Member) Date: 6 years ago Comments: 0

DB_UNIQUE_NAME, PDB, and Data Guard

When you change the parameters for the database is possible to specify the db_unique_name and allow more control where you want to apply/use it. This is very useful to limit the scope, but you need to be aware of some collateral effects. Even not present at the official doc, you can use it. But check here some details that you need to take care of.

Unplug and Plug

The environment below is a simple 19c single instance database. There, exists 3 PDB’s and you can see the db_unique_name for this CDB:

SQL> show pdbs




    CON_ID CON_NAME                       OPEN MODE  RESTRICTED

---------- ------------------------------ ---------- ----------

         2 PDB$SEED                       READ ONLY  NO

         3 PDBDG                          READ WRITE NO

         5 PDBMOVE                        READ WRITE NO

SQL> show parameter db_unique_name;




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

db_unique_name                       string      or19dg

SQL>

And we can connect in the PDBMOVE and change the parameters. First, as usual, normal set without any special parameter:

SQL> alter system set open_cursors = 300 scope = spfile;




System altered.




SQL>

But, we can specify the db_unique_name for alter system. ALTER SYSTEM… DB_UNIQUE_NAME = ‘VALUE’. Here you can see that I used the same that as defined for root level.

SQL> alter system set sessions = 100  scope = spfile db_unique_name = 'or19dg';




System altered.




SQL> alter system set cursor_sharing = FORCE scope = spfile db_unique_name = 'or19dg';




System altered.




SQL>

Detail #1 here, if you specify the DB_UNIQUE_NAME the scope can be JUST spfile. Otherwise you will receive error.

SQL> alter system set cursor_sharing = FORCE scope = both db_unique_name = 'or19dg';

alter system set cursor_sharing = FORCE scope = both db_unique_name = 'or19dg'

*

ERROR at line 1:

ORA-65147: DB_UNIQUE_NAME specified without SPFILE scope

SQL>

In this example I set two parameters for PDB level and you can see at PDB_SPFILE$ inside of root cdb that values was set:

SQL> alter session set container = cdb$root;




Session altered.




SQL> select DB_UNIQ_NAME, NAME, VALUE$, PDB_UID from PDB_SPFILE$ where PDB_UID = 2549618825;




DB_UNIQ_NAME                   NAME                           VALUE$                                      PDB_UID

------------------------------ ------------------------------ ---------------------------------------- ----------

*                              open_cursors                   300                                      2549618825

or19dg                         cursor_sharing                 'FORCE'                                  2549618825

or19dg                         sessions                       100                                      2549618825




SQL>

And after the reload of the database the values are set:

[oracle@orcl19p ~]$ srvctl stop database -d or19dg

[oracle@orcl19p ~]$ srvctl start database -d or19dg

[oracle@orcl19p ~]$

[oracle@orcl19p ~]$ sqlplus / as sysdba




SQL*Plus: Release 19.0.0.0.0 - Production on Mon Jun 1 17:33:29 2020

Version 19.5.0.0.0




Copyright (c) 1982, 2019, Oracle.  All rights reserved.







Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.5.0.0.0




SQL> show pdbs




    CON_ID CON_NAME                       OPEN MODE  RESTRICTED

---------- ------------------------------ ---------- ----------

         2 PDB$SEED                       READ ONLY  NO

         3 PDBDG                          READ WRITE NO

         5 PDBMOVE                        READ WRITE NO

SQL> alter session set container = PDBMOVE;




Session altered.




SQL> show parameter cursor_sharing;




NAME                                 TYPE                             VALUE

------------------------------------ -------------------------------- ------------------------------

cursor_sharing                       string                           FORCE

SQL>

UNPLUG

But if I need to unplug the PDB, the values will be exported too, right? Yes, but no as you imagine.

So, doing a simple unplug to XML:

SQL> alter pluggable database PDBMOVE close immediate;




Pluggable database altered.




SQL> alter pluggable database PDBMOVE unplug into '/tmp/pdbmove_par.xml';




Pluggable database altered.




SQL> drop pluggable database PDBMOVE keep datafiles;




Pluggable database dropped.




SQL>

And checking the file for these parameters at generated XML:

[oracle@orcl19p ~]$ cat /tmp/pdbmove_par.xml |grep open_cursors

      <spfile>*.open_cursors=300#HWM:300,</spfile>

[oracle@orcl19p ~]$ cat /tmp/pdbmove_par.xml |grep cursor_sharing

      <spfile>*.cursor_sharing='FORCE'</spfile>

[oracle@orcl19p ~]$ cat /tmp/pdbmove_par.xml |grep sessions

      <spfile>*.sessions=100#HWM:100,flag:1,</spfile>

[oracle@orcl19p ~]$

As you can see here, the parameters was exported, but the parameter changed to “*”. The definition to specific db_unique_name was cleared.

And if we try to plug again the same PDB, we can see that parameter was loaded as “*”:

SQL> create  pluggable database PDBMOVE USING '/tmp/pdbmove_par.xml' NOCOPY TEMPFILE REUSE;




Pluggable database created.




SQL> col name format a30

SQL> col VALUE$ format a40

SQL> set linesize 250

SQL> select CON_ID,DBID,CON_UID,GUID,NAME from v$pdbs;




    CON_ID       DBID    CON_UID GUID                             NAME

---------- ---------- ---------- -------------------------------- ------------------------------

         2  826464235  826464235 9B151A78B6DB533AE0533205A00AFC30 PDB$SEED

         3 1340021208 1340021208 9B155010BEAC661BE0533205A00AF21B PDBDG

         4 2549618825 3364812106 A7082B6C610C5E1DE0533205A00AF7FE PDBMOVE




SQL> show pdbs




    CON_ID CON_NAME                       OPEN MODE  RESTRICTED

---------- ------------------------------ ---------- ----------

         2 PDB$SEED                       READ ONLY  NO

         3 PDBDG                          READ WRITE NO

         4 PDBMOVE                        MOUNTED

SQL> alter pluggable database PDBMOVE open;




Pluggable database altered.




SQL> select DB_UNIQ_NAME, NAME, VALUE$, PDB_UID from PDB_SPFILE$ where PDB_UID = 3364812106;




DB_UNIQ_NAME                   NAME                           VALUE$                                      PDB_UID

------------------------------ ------------------------------ ---------------------------------------- ----------

*                              cursor_sharing                 'FORCE'                                  3364812106

*                              open_cursors                   300                                      3364812106

*                              sessions                       100                                      3364812106




SQL>

DATAGUARD

As you can imagine, using the db_unique_name for an alter system can affect the dataguard environments. If you know, the db_unique_name is different between primary and standby. So, if you define some parameter, depending on the way that you made, after you, switchover/failover can occur that parameter returns to the default value.

It is not the case that this is wrong, maybe you need to set some parameters for one server/side in specific. Maybe because of less memory, less CPU power, or whatever the reason.

In this example, I have gold19c as primary, and golds19c as standby. And I have the GOLD19P as used PDB here:

[oracle@goldpn1 ~]$ dgmgrl sys/oracle@gold19c

DGMGRL for Linux: Release 19.0.0.0.0 - Production on Mon Jun 1 19:13:39 2020

Version 19.7.0.0.0




Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.




Welcome to DGMGRL, type "help" for information.

Connected to "GOLD19C"

Connected as SYSDBA.

DGMGRL> show configuration;




Configuration - gold19c




  Protection Mode: MaxAvailability

  Members:

  gold19c  - Primary database

    golds19c - Physical standby database




Fast-Start Failover:  Disabled




Configuration Status:

SUCCESS   (status updated 72 seconds ago)




DGMGRL> exit

[oracle@goldpn1 ~]$

[oracle@goldpn1 ~]$ sqlplus / as sysdba




SQL*Plus: Release 19.0.0.0.0 - Production on Mon Jun 1 19:16:50 2020

Version 19.7.0.0.0




Copyright (c) 1982, 2020, Oracle.  All rights reserved.







Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.7.0.0.0




SQL> show parameter db_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

db_name                              string      gold19c

SQL> show parameter db_unique_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

db_unique_name                       string      GOLD19C

SQL>

SQL> show pdbs




    CON_ID CON_NAME                       OPEN MODE  RESTRICTED

---------- ------------------------------ ---------- ----------

         2 PDB$SEED                       READ ONLY  NO

         3 GOLD19P                        READ WRITE NO

SQL> col name format a30

SQL> col VALUE$ format a40

SQL> set linesize 250

SQL> select CON_ID,DBID,CON_UID,GUID,NAME from v$pdbs;




    CON_ID       DBID    CON_UID GUID                             NAME

---------- ---------- ---------- -------------------------------- ------------------------------

         2 2661745537 2661745537 A6B13C68753F63D3E0530A07A00A6303 PDB$SEED

         3 2141275600 2141275600 A6B19FBF38306073E0530C07A00ACE20 GOLD19P




SQL>

And to set the parameters I used the same than before, with and without db_unique_name:

SQL> alter session set container = GOLD19P;




Session altered.




SQL> alter system set open_cursors = 300 scope = spfile;




System altered.




SQL> alter system set cursor_sharing = FORCE scope = spfile db_unique_name = 'GOLD19C';




System altered.




SQL>

And as you can see, the parameter was stored.

SQL> alter session set container = cdb$root;




Session altered.




SQL> select DB_UNIQ_NAME, NAME, VALUE$, PDB_UID from PDB_SPFILE$ where PDB_UID = 2141275600;




DB_UNIQ_NAME                   NAME                           VALUE$                                      PDB_UID

------------------------------ ------------------------------ ---------------------------------------- ----------

*                              open_cursors                   300                                      2141275600

GOLD19C                        cursor_sharing                 'FORCE'                                  2141275600




SQL>

But if I made the switchover:

[oracle@goldpn1 ~]$ dgmgrl sys/oracle@gold19c

DGMGRL for Linux: Release 19.0.0.0.0 - Production on Mon Jun 1 19:24:15 2020

Version 19.7.0.0.0




Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.




Welcome to DGMGRL, type "help" for information.

Connected to "GOLD19C"

Connected as SYSDBA.

DGMGRL> switchover to golds19c;

Performing switchover NOW, please wait...

Operation requires a connection to database "golds19c"

Connecting ...

Connected to "GOLDS19C"

Connected as SYSDBA.

New primary database "golds19c" is opening...

Oracle Clusterware is restarting database "gold19c" ...

Connected to "GOLD19C"

Connected to "GOLD19C"

Switchover succeeded, new primary is "golds19c"

DGMGRL>

You can see that now, the value for the parameter cursor_sharing now have the default value because the db_unique_name does not hit what was defined:

[oracle@goldsn1 ~]$ sqlplus / as sysdba




SQL*Plus: Release 19.0.0.0.0 - Production on Mon Jun 1 19:35:01 2020

Version 19.7.0.0.0




Copyright (c) 1982, 2020, Oracle.  All rights reserved.







Connected to:

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Version 19.7.0.0.0




SQL> show parameter db_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

db_name                              string      gold19c

SQL> show parameter db_unique_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

db_unique_name                       string      GOLDS19C

SQL>

SQL> alter session set container = GOLD19P;




Session altered.




SQL> show parameter cursor_sharing;




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cursor_sharing                       string      EXACT

SQL> show parameter open_cursors;




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

open_cursors                         integer     300

SQL>

But you can see that the parameter is defined at PDB_SPFILE$:

SQL> alter session set container = cdb$root;




Session altered.




SQL>

SQL> col name format a30

SQL> col VALUE$ format a40

SQL> set linesize 250

SQL> select CON_ID,DBID,CON_UID,GUID,NAME from v$pdbs;




    CON_ID       DBID    CON_UID GUID                             NAME

---------- ---------- ---------- -------------------------------- ------------------------------

         2 2661745537 2661745537 A6B13C68753F63D3E0530A07A00A6303 PDB$SEED

         3 2141275600 2141275600 A6B19FBF38306073E0530C07A00ACE20 GOLD19P




SQL> select DB_UNIQ_NAME, NAME, VALUE$, PDB_UID from PDB_SPFILE$ where PDB_UID = 2141275600;




DB_UNIQ_NAME                   NAME                           VALUE$                                      PDB_UID

------------------------------ ------------------------------ ---------------------------------------- ----------

*                              open_cursors                   300                                      2141275600

GOLD19C                        cursor_sharing                 'FORCE'                                  2141275600




SQL>

As showed, the value at standby is not defied, but as told before this can be expected behaviour. Or no. If you want to set the parameter and it became value in both, primary and standby, you don’t need to specify the db_unique_name, otherwise when the switchover/failover occurs the value will not be there. And, this can lead to some unexpected behaviours (at application side as and example) and need to troubleshoot (and until find this about set parameter, can be a long path).

Others points

The db_unique_name option does not check what you specify, so, take care to set the correct value. As you can see below I defined it as SIMON, and was accepted and saved:

SQL> alter session set container = GOLD19P;

Session altered.

SQL> alter system set cursor_sharing = FORCE scope = spfile db_unique_name = 'SIMON';

System altered.

SQL> alter session set container = cdb$root;

Session altered.

SQL> select DB_UNIQ_NAME, NAME, VALUE$, PDB_UID from PDB_SPFILE$ where PDB_UID = 2141275600;

DB_UNIQ_NAME                   NAME                           VALUE$                                      PDB_UID

------------------------------ ------------------------------ ---------------------------------------- ----------

*                              open_cursors                   300                                      2141275600

GOLD19C                        cursor_sharing                 'FORCE'                                  2141275600

SIMON                          cursor_sharing                 'FORCE'                                  2141275600

SQL>

And to reset, the process is similar. As you see below, you can specify the db_unique_name to delete specific. But if you specify nothing, the reset will be done for all.

SQL> alter system reset cursor_sharing scope = spfile db_unique_name = 'GOLD19C';

System altered.

SQL> alter system reset cursor_sharing scope = spfile db_unique_name = 'GOLD19C';

alter system reset cursor_sharing scope = spfile db_unique_name = 'GOLD19C'

*

ERROR at line 1:

ORA-32010: cannot find entry to delete in SPFILE

SQL> alter system set cursor_sharing = FORCE scope = spfile db_unique_name = 'GOLD19C';

System altered.

SQL> alter system reset cursor_sharing scope = spfile;

System altered.

SQL> alter system reset cursor_sharing scope = spfile db_unique_name = 'GOLD19C';

alter system reset cursor_sharing scope = spfile db_unique_name = 'GOLD19C'

*

ERROR at line 1:

ORA-32010: cannot find entry to delete in SPFILE

SQL>

Conclusion

Sometimes a simple definition can lead to some strange behaviors. Think that your application is running ok and after a switchover change completely the way how it works (cursor_sharing as above). And until you dig the solution can take time.

Db_unique_name for set parameter needs to be used carefully to avoid these cases. It is not the case that you never need to use it, sometimes this definition needs to be used. You can use it to prepare everything BEFORE the switchover to avoid some error or to tune the database since the beginning.

References

Initialization parameters in a Multitenant database – FAQ and Examples (Doc ID 2101638.1)

Initialization parameters in a Multitenant database – Facts and additional information (Doc ID 2101596.1)

2.7 Managing Initialization Parameters Using a Server Parameter File

Exadata and ZDLRA, Disable HAIP

Category: Engineer System Author: Fernando Simon (Board Member) Date: 6 years ago Comments: 0

Exadata and ZDLRA, Disable HAIP

HAIP (High Availability IP) is not supported for the Exadata environment but can occur (if you did not create the cluster using OEDA) that HAIP became in use. And this particularity true for ZDLRA. So, during the upgrade from the previous version (12.2) to a higher version, it is needed to remove HAIP.

Usually, when we upgrading from 12.2 to 18c the HAIP is removed from Exadata. If the upgrade is from 12.1, and HAIP is there, it continues and is not removed by the upgrade process. If you are using HAIP and your GI is 12.1, this procedure as-is described here can’t be used (need some adaptation), because of some requirements from ASM+ACFS+DB. But since this is a preliminary step from a GI upgrade, the focus is to disable and remove it from GI.

The HAIP is not needed for Exadata because by architecture the InfiniBand network already defines (per server) two IP’s to avoid the single point of failure. So, it is not needed to create an additional layer (HAIP and virtual IP), that does the same that already exists by network design.

*Image removed from Oracle Presentation: Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour Oct 2019

This procedure can be used for a non-Exadata environment too. But before continue it is important to be aware of some details with ACFS. If you use ACFS, be careful since the HAIP IP is used in the ASM proxy and needs to be updated; and if ACFS is in place, this procedure is valid just for GI 12.2+.

Environment

In this scenario, I am removing HAIP from ZDLRA GI. The HAIP has configured automatically and we are using ACFS.

The focus is to disable HAIP and remove it from CRS. Another detail is that we need to remove the dependencies between the CRS resources (like ASM that need HAIP) too.

One important detail is that since we are removing the HAIP, we need to swap the HAIP IP that it is used by services (like ASM Proxy) for something that exists. So, you need to have high availability IP’s to use. For Exadata, we use the interconnect IP’s, but if you are using in another environment, be careful that your network has the needed requirements (throughput, failover, and others).

GI Upgrade

During the GI upgrade from 12.2 to 19c we need to run the runcluvfy and it will detect if HAIP is enabled:

....

....

Checks did not pass for the following nodes:

        zeroserv02,zeroserv01







Failures were encountered during execution of CVU verification request "stage -pre crsinst".




Verifying Node Connectivity ...FAILED

zeroserv02: PRVG-11068 : Highly Available IP (HAIP) is enabled on the nodes

            "zeroserv01,zeroserv02".




zeroserv01: PRVG-11068 : Highly Available IP (HAIP) is enabled on the nodes

            "zeroserv01,zeroserv02".




Verifying RPM Package Manager database ...INFORMATION

PRVG-11250 : The check "RPM Package Manager database" was not performed because

it needs 'root' user privileges.







CVU operation performed:      stage -pre crsinst

Date:                         Dec 9, 2019 1:38:46 PM

CVU home:                     /u01/app/19.0.0.0/grid/

User:                         oracle

[root@zeroserv01 ~]#

Remove HAIP

The steps need to be executed in order to avoid errors during the procedure. Unfortunately, it is needed to have one maintenance window due to the unavailability of the services. The running databases need to be shutdown due to the CRS restart by an example.

Bellow, the steps are summarized and explained to be followed. Please be aware of the IP’s involved (they will be different in your case).

1 – Shutdown services

So, this first point is to shutdown all databases running in this cluster:

[oracle@zeroserv01 ~]$ srvctl stop database -d zdlras -o immediate

[oracle@zeroserv01 ~]$

2- Check HAIP, ASM Proxy, and ACFS

It is needed to check if HAIP is enabled. Another detail is check for ASM proxy, and ACFS too:

[oracle@zeroserv01 ~]$ $ORACLE_HOME/bin/crsctl stat res ora.cluster_interconnect.haip -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       zeroserv01               STABLE

--------------------------------------------------------------------------------

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ $ORACLE_HOME/bin/crsctl stat res ora.proxy_advm -t

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.proxy_advm

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

--------------------------------------------------------------------------------

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ $ORACLE_HOME/bin/crsctl stat res -w "TYPE = ora.acfs.type" -t

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.catalog.raadmin.acfs

               ONLINE  ONLINE       zeroserv01               mounted on /raacfs/r

                                                             aadmin,STABLE

               ONLINE  ONLINE       zeroserv02               mounted on /raacfs/r

                                                             aadmin,STABLE

ora.catalog.raosbadmin.acfs

               ONLINE  ONLINE       zeroserv01               mounted on /osbcat,S

                                                             TABLE

               ONLINE  ONLINE       zeroserv02               mounted on /osbcat,S

                                                             TABLE

--------------------------------------------------------------------------------

[oracle@zeroserv01 ~]$

3 – Check New IP

The next step is to check the new IP that will be used. Since ASM does not use HAIP, we can (at Exadata) pickup the same used IP:

[oracle@zeroserv01 ~]$ export ORACLE_SID=+ASM1

[oracle@zeroserv01 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv01 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Mon Dec 9 15:00:55 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string      192.168.10.1:192.168.10.2

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ $ORACLE_HOME/bin/oifcfg getif

bondeth0  10.208.68.0  global  public

ib0  192.168.8.0  global  cluster_interconnect,asm

ib1  192.168.8.0  global  cluster_interconnect,asm

[oracle@zeroserv01 ~]$

In this case, the IP that will substitute HAIP IP is 192.168.10.1/192.168.10.2. And, as you can see above, they are from the interconnect network.

4 – Fixing IP for ASM Proxy

The next step is to fix the IP that is used by the ASM Proxy instance. But default, it pickup the HAIP IP during the startup. The idea here is to force the same IP than the ASM instance for cluster_interconnect parameter.

Connecting at APX instance

Connect at APX instance, check the parameters (and see that are without values). After that we create one init file from memory to have the backup:

[oracle@zeroserv01 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv01 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv01 ~]$ export ORACLE_SID=+APX1

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 10:20:59 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL>  show parameter instance_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_name                        string      +APX1

SQL>  show parameter instance_type




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_type                        string      ASMPROXY

SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string

SQL> create pfile = '/tmp/pfileapx1' from memory;




File created.




SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv01 ~]$

Changing the IP

For ASM Proxy instance we need to set the IP for cluster_interconnect parameter with the same IP used by ASM instance.

To do that, we need to set it in init for ASM Proxy. But since the ASM Proxy instance doesn’t have one, we create one at $GI_HOME/dbs with the connect name (look below the folder and instance name). Below I set just the cluster to interconnect parameter and dedicated to the instance running in this node.

[oracle@zeroserv01 ~]$ echo "+APX1.cluster_interconnects='192.168.10.1:192.168.10.2'" > $ORACLE_HOME/dbs/init+APX1.ora

[oracle@zeroserv01 ~]$

[oracle@zeroserv01 ~]$ cat $ORACLE_HOME/dbs/init+APX1.ora

+APX1.cluster_interconnects='192.168.10.1:192.168.10.2'

[oracle@zeroserv01 ~]$

Restart the ASM Proxy

After creating the init, we restart the ASM Proxy in this node.

[root@zeroserv01 ~]# export ORACLE_HOME=/u01/app/12.2.0.1/grid

[root@zeroserv01 ~]# export PATH=$ORACLE_HOME/bin:$PATH

[root@zeroserv01 ~]#

[root@zeroserv01 ~]# $ORACLE_HOME/bin/srvctl stop asm -proxy -node zeroserv01 -force

[root@zeroserv01 ~]#

[root@zeroserv01 ~]# $ORACLE_HOME/bin/srvctl start asm -proxy -node zeroserv01

[root@zeroserv01 ~]#

[root@zeroserv01 ~]# ps -ef |grep APX

oracle   267811      1  0 10:43 ?        00:00:00 apx_pmon_+APX1

oracle   267813      1  0 10:43 ?        00:00:00 apx_clmn_+APX1

oracle   267815      1  0 10:43 ?        00:00:00 apx_psp0_+APX1

oracle   267820      1  1 10:43 ?        00:00:00 apx_vktm_+APX1

oracle   267824      1  0 10:43 ?        00:00:00 apx_gen0_+APX1

oracle   267826      1  0 10:43 ?        00:00:00 apx_mman_+APX1

oracle   267830      1  0 10:43 ?        00:00:00 apx_gen1_+APX1

oracle   267834      1  0 10:43 ?        00:00:00 apx_diag_+APX1

oracle   267836      1  0 10:43 ?        00:00:00 apx_dskm_+APX1

oracle   267838      1  0 10:43 ?        00:00:00 apx_pman_+APX1

oracle   267840      1  0 10:43 ?        00:00:00 apx_dia0_+APX1

oracle   267842      1  0 10:43 ?        00:00:00 apx_lreg_+APX1

oracle   267845      1  0 10:43 ?        00:00:00 apx_pxmn_+APX1

oracle   267847      1  0 10:43 ?        00:00:00 apx_rbal_+APX1

oracle   267849      1  0 10:43 ?        00:00:00 apx_vbg0_+APX1

oracle   267851      1  0 10:43 ?        00:00:00 apx_vdbg_+APX1

oracle   267853      1  0 10:43 ?        00:00:00 apx_vubg_+APX1

root     267979  32720  0 10:43 pts/0    00:00:00 grep --color=auto APX

[root@zeroserv01 ~]#

Look that instance restarted correctly and it is up.

And we can check the parameter if it is OK with the IP:

[oracle@zeroserv01 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 10:51:59 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string      192.168.10.1:192.168.10.2

SQL> show parameter instance_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_name                        string      +APX1

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv01 ~]$

If the instance not startup (or the IP’s are wrong) you need to check the init file that you created in $GIHOME/dbs folder.

Restart ACFS

Since we restarted the ASM Proxy instance, the ACFS mountpoints went down and need to be up again:

[root@zeroserv01 ~]# $ORACLE_HOME/bin/crsctl start res -w "TYPE = ora.acfs.type" -n zeroserv01

CRS-2672: Attempting to start 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv01'

CRS-2672: Attempting to start 'ora.CATALOG.RAADMIN.advm' on 'zeroserv01'

CRS-2676: Start of 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv01' succeeded

CRS-2672: Attempting to start 'ora.catalog.raosbadmin.acfs' on 'zeroserv01'

CRS-2676: Start of 'ora.CATALOG.RAADMIN.advm' on 'zeroserv01' succeeded

CRS-2672: Attempting to start 'ora.catalog.raadmin.acfs' on 'zeroserv01'

CRS-2676: Start of 'ora.catalog.raosbadmin.acfs' on 'zeroserv01' succeeded

CRS-2676: Start of 'ora.catalog.raadmin.acfs' on 'zeroserv01' succeeded

[root@zeroserv01 ~]#

I started up using the CRS resource for that.

Other Nodes

After it is Ok for node 1, we can do it for other nodes. Here, since I made for ZDLRA, I have just the second node to fix ASM Proxy. Be careful with the IP’s used, they are specific for this node and remember to use the correct init file.

[root@zeroserv02 ~]# su - oracle

Last login: Tue Dec 10 10:53:02 CET 2019

[oracle@zeroserv02 ~]$ export ORACLE_SID=+ASM2

[oracle@zeroserv02 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv02 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 10:56:24 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string      192.168.10.3:192.168.10.4

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv02 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv02 ~]$ export ORACLE_SID=+APX1

[oracle@zeroserv02 ~]$ export ORACLE_SID=+APX2

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 10:57:15 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL>

SQL> show parameter instance_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_name                        string      +APX2

SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string

SQL> create pfile = '/tmp/pfileapx2' from memory;




File created.




SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ ls -l $ORACLE_HOME/dbs/init?APX*

ls: cannot access /u01/app/12.2.0.1/grid/dbs/init?APX*: No such file or directory

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ echo "+APX2.cluster_interconnects='192.168.10.3:192.168.10.4'" > $ORACLE_HOME/dbs/init+APX2.ora

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ cat $ORACLE_HOME/dbs/init+APX2.ora

+APX2.cluster_interconnects='192.168.10.3:192.168.10.4'

[oracle@zeroserv02 ~]$

[oracle@zeroserv02 ~]$ exit   

[root@zeroserv02 ~]# export ORACLE_HOME=/u01/app/12.2.0.1/grid

[root@zeroserv02 ~]# export PATH=$ORACLE_HOME/bin:$PATH

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# $ORACLE_HOME/bin/srvctl stop asm -proxy -node zeroserv02 -force

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# $ORACLE_HOME/bin/srvctl start asm -proxy -node zeroserv02

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# $ORACLE_HOME/bin/crsctl start res -w "TYPE = ora.acfs.type" -n zeroserv02

CRS-2672: Attempting to start 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.CATALOG.RAADMIN.advm' on 'zeroserv02'

CRS-2676: Start of 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.catalog.raosbadmin.acfs' on 'zeroserv02'

CRS-2676: Start of 'ora.CATALOG.RAADMIN.advm' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.catalog.raadmin.acfs' on 'zeroserv02'

CRS-2676: Start of 'ora.catalog.raosbadmin.acfs' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.catalog.raadmin.acfs' on 'zeroserv02' succeeded

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# su - oracle

Last login: Tue Dec 10 10:54:10 CET 2019

[oracle@zeroserv02 ~]$ export ORACLE_HOME=/u01/app/12.2.0.1/grid

[oracle@zeroserv02 ~]$ export PATH=$ORACLE_HOME/bin:$PATH

[oracle@zeroserv02 ~]$ export ORACLE_SID=+APX1

[oracle@zeroserv02 ~]$ export ORACLE_SID=+APX2




[oracle@zeroserv02 ~]$ sqlplus / as sysasm




SQL*Plus: Release 12.2.0.1.0 Production on Tue Dec 10 11:00:46 2019




Copyright (c) 1982, 2016, Oracle.  All rights reserved.







Connected to:

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production




SQL> show parameter cluster_interconnects




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

cluster_interconnects                string      192.168.10.3:192.168.10.4

SQL> show parameter instance_name




NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

instance_name                        string      +APX2

SQL> exit

Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

You have new mail in /var/spool/mail/oracle

[oracle@zeroserv02 ~]$

5 – ASM Dependency

Since the ASM since the CRS depends on HAIP we need to remove this dependency. Here we have a tricky part. It is needed to completely change the dependencies for ASM, setting it to CRS/CSS directly.

First, at first node we check the current dependency for START and STOP:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res ora.asm -p |grep START_DEPENDENCIES=

START_DEPENDENCIES=hard(ora.ASMNET1LSNR_ASM.lsnr,ora.ASMNET2LSNR_ASM.lsnr) weak(ora.LISTENER.lsnr) pullup(ora.ASMNET1LSNR_ASM.lsnr,ora.ASMNET2LSNR_ASM.lsnr) dispersion:active(site:type:ora.asm.type)

[root@zeroserv01 ~]#

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res ora.asm -p |grep STOP_DEPENDENCIES=

STOP_DEPENDENCIES=hard(intermediate:ora.ASMNET1LSNR_ASM.lsnr,intermediate:ora.ASMNET2LSNR_ASM.lsnr)

[root@zeroserv01 ~]#

And now we change it. Look at the parameters values. They are completely different, and need to be with these specific values:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.asm -attr "START_DEPENDENCIES='hard(ora.cssd,ora.ctssd) pullup(ora.cssd,ora.ctssd) weak(ora.drivers.acfs)',STOP_DEPENDENCIES='hard(intermediate:ora.cssd)'" -init

[root@zeroserv01 ~]#

If you see closely the HAIP it is not listed there as a dependency, setting with the values, we completely remove HAIP dependency and inheritance.

And we need to do the same for the other node:

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.asm -attr "START_DEPENDENCIES='hard(ora.cssd,ora.ctssd) pullup(ora.cssd,ora.ctssd) weak(ora.drivers.acfs)',STOP_DEPENDENCIES='hard(intermediate:ora.cssd)'" -init

[root@zeroserv02 ~]#

6 – Disable HAIP Resource

The next step is to disable the HAIP resource from startup at CRS. We do this in both nodes.

Node 1:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init

[root@zeroserv01 ~]#

Node 2:

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init

[root@zeroserv02 ~]#

But check that we do not stop yet the HAIP resource. It is needed to remain up at this moment.

If we check with the init for CRS, it is still there in node 1:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.crf

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.crsd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cssd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cssdmonitor

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.ctssd

      1        ONLINE  ONLINE       zeroserv01               OBSERVER,STABLE

ora.diskmon

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.drivers.acfs

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.drivers.oka

      1        OFFLINE OFFLINE                               STABLE

ora.evmd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.gipcd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.gpnpd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.mdnsd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.storage

      1        ONLINE  ONLINE       zeroserv01               STABLE

--------------------------------------------------------------------------------

[root@zeroserv01 ~]#

And for the other node too:

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cluster_interconnect.haip

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.crf

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.crsd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cssd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cssdmonitor

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.ctssd

      1        ONLINE  ONLINE       zeroserv02               ACTIVE:0,STABLE

ora.diskmon

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.drivers.acfs

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.drivers.oka

      1        OFFLINE OFFLINE                               STABLE

ora.evmd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.gipcd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.gpnpd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.mdnsd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.storage

      1        ONLINE  ONLINE       zeroserv02               STABLE

--------------------------------------------------------------------------------

[root@zeroserv02 ~]#

7 – Restart CRS

After we configure the dependencies for ASM we can restart CRS to shutfown HAIP (and test the changes that we made). I prefer to execute, first, in just one node and if everything goes well, I do for the others.

Stop Cluster

So, first the STOP CLUSTER in the first node:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stop cluster -f

CRS-2673: Attempting to stop 'ora.crsd' on 'zeroserv01'

CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'zeroserv01'

CRS-2673: Attempting to stop 'ora.catalog.raadmin.acfs' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.catalog.raosbadmin.acfs' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.DELTA.dg' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.CATALOG.dg' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.cvu' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.chad' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.chad' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.LISTENER_SCAN3.lsnr' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.qosmserver' on 'zeroserv01'

CRS-2677: Stop of 'ora.DELTA.dg' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.CATALOG.dg' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.asm' on 'zeroserv01'

CRS-2677: Stop of 'ora.cvu' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.asm' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.ASMNET2LSNR_ASM.lsnr' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'zeroserv01'

CRS-2677: Stop of 'ora.LISTENER_SCAN2.lsnr' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.scan2.vip' on 'zeroserv01'

CRS-2677: Stop of 'ora.LISTENER_SCAN3.lsnr' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.scan3.vip' on 'zeroserv01'

CRS-2677: Stop of 'ora.catalog.raosbadmin.acfs' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv01'

CRS-2677: Stop of 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.catalog.raadmin.acfs' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.RAADMIN.advm' on 'zeroserv01'

CRS-2677: Stop of 'ora.CATALOG.RAADMIN.advm' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.proxy_advm' on 'zeroserv01'

CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.zeroserv01.vip' on 'zeroserv01'

CRS-2677: Stop of 'ora.scan3.vip' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.qosmserver' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.scan2.vip' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.zeroserv01.vip' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.ASMNET2LSNR_ASM.lsnr' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.proxy_advm' on 'zeroserv01' succeeded

CRS-2675: Stop of 'ora.chad' on 'zeroserv02' failed

CRS-2679: Attempting to clean 'ora.chad' on 'zeroserv02'

CRS-2675: Stop of 'ora.chad' on 'zeroserv01' failed

CRS-2679: Attempting to clean 'ora.chad' on 'zeroserv01'

CRS-2681: Clean of 'ora.chad' on 'zeroserv02' succeeded

CRS-2681: Clean of 'ora.chad' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.mgmtdb' on 'zeroserv01'

CRS-2677: Stop of 'ora.mgmtdb' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.MGMTLSNR' on 'zeroserv01'

CRS-2677: Stop of 'ora.MGMTLSNR' on 'zeroserv01' succeeded

CRS-2672: Attempting to start 'ora.MGMTLSNR' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.qosmserver' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.scan2.vip' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.scan3.vip' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.cvu' on 'zeroserv02'

CRS-2672: Attempting to start 'ora.zeroserv01.vip' on 'zeroserv02'

CRS-2676: Start of 'ora.cvu' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.scan2.vip' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.zeroserv01.vip' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.LISTENER_SCAN2.lsnr' on 'zeroserv02'

CRS-2676: Start of 'ora.scan3.vip' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.LISTENER_SCAN3.lsnr' on 'zeroserv02'

CRS-2676: Start of 'ora.MGMTLSNR' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.mgmtdb' on 'zeroserv02'

CRS-2676: Start of 'ora.qosmserver' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.LISTENER_SCAN3.lsnr' on 'zeroserv02' succeeded

CRS-2676: Start of 'ora.mgmtdb' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.chad' on 'zeroserv02'

CRS-2676: Start of 'ora.chad' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.ons' on 'zeroserv01'

CRS-2677: Stop of 'ora.ons' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.net1.network' on 'zeroserv01'

CRS-2677: Stop of 'ora.net1.network' on 'zeroserv01' succeeded

CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'zeroserv01' has completed

CRS-2677: Stop of 'ora.crsd' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.ctssd' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.evmd' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.storage' on 'zeroserv01'

CRS-2677: Stop of 'ora.storage' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.asm' on 'zeroserv01'

CRS-2677: Stop of 'ora.ctssd' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.evmd' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.asm' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'zeroserv01'

CRS-2677: Stop of 'ora.cssd' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.diskmon' on 'zeroserv01'

CRS-2677: Stop of 'ora.diskmon' on 'zeroserv01' succeeded

[root@zeroserv01 ~]#

Stop/Start CRS

And if everything was successful, we can stop CRS. This is needed because HAIP is from init from CRS:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.gpnpd' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.crf' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'zeroserv01'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'zeroserv01'

CRS-2677: Stop of 'ora.drivers.acfs' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.gpnpd' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.crf' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'zeroserv01'

CRS-2677: Stop of 'ora.mdnsd' on 'zeroserv01' succeeded

CRS-2677: Stop of 'ora.gipcd' on 'zeroserv01' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'zeroserv01' has completed

CRS-4133: Oracle High Availability Services has been stopped.

[root@zeroserv01 ~]#

After that we can start again CRS in this node:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@zeroserv01 ~]#

Check CRS Init

And after some time we can see if the HAIP not restarted (as expected):

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       zeroserv01               Started,STABLE

ora.cluster_interconnect.haip

      1        OFFLINE OFFLINE                               STABLE

ora.crf

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.crsd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cssd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.cssdmonitor

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.ctssd

      1        ONLINE  ONLINE       zeroserv01               ACTIVE:0,STABLE

ora.diskmon

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.drivers.acfs

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.drivers.oka

      1        OFFLINE OFFLINE                               STABLE

ora.evmd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.gipcd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.gpnpd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.mdnsd

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.storage

      1        ONLINE  ONLINE       zeroserv01               STABLE

--------------------------------------------------------------------------------

[root@zeroserv01 ~]#

As you can see above, the HAIP not started during the init of CRS. If the HAIP still up, please check the topic 5.

Other nodes

After that, we can do the same in the other nodes.

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stop cluster -f

CRS-2673: Attempting to stop 'ora.crsd' on 'zeroserv02'

CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'zeroserv02'

CRS-2673: Attempting to stop 'ora.DELTA.dg' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.catalog.raadmin.acfs' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.chad' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.catalog.raosbadmin.acfs' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'zeroserv02'

CRS-2677: Stop of 'ora.DELTA.dg' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.dg' on 'zeroserv02'

CRS-2677: Stop of 'ora.CATALOG.dg' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.asm' on 'zeroserv02'

CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.zeroserv02.vip' on 'zeroserv02'

CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.scan1.vip' on 'zeroserv02'

CRS-2677: Stop of 'ora.catalog.raadmin.acfs' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.RAADMIN.advm' on 'zeroserv02'

CRS-2677: Stop of 'ora.CATALOG.RAADMIN.advm' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.catalog.raosbadmin.acfs' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv02'

CRS-2677: Stop of 'ora.CATALOG.RAOSBADMIN.advm' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.proxy_advm' on 'zeroserv02'

CRS-2677: Stop of 'ora.asm' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.ASMNET2LSNR_ASM.lsnr' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.ASMNET1LSNR_ASM.lsnr' on 'zeroserv02'

CRS-2677: Stop of 'ora.zeroserv02.vip' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.scan1.vip' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.ASMNET1LSNR_ASM.lsnr' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.ASMNET2LSNR_ASM.lsnr' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.proxy_advm' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.chad' on 'zeroserv02' succeeded

CRS-2672: Attempting to start 'ora.scan1.vip' on 'zeroserv01'

CRS-2672: Attempting to start 'ora.zeroserv02.vip' on 'zeroserv01'

CRS-2676: Start of 'ora.zeroserv02.vip' on 'zeroserv01' succeeded

CRS-2676: Start of 'ora.scan1.vip' on 'zeroserv01' succeeded

CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'zeroserv01'

CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'zeroserv01' succeeded

CRS-2673: Attempting to stop 'ora.ons' on 'zeroserv02'

CRS-2677: Stop of 'ora.ons' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.net1.network' on 'zeroserv02'

CRS-2677: Stop of 'ora.net1.network' on 'zeroserv02' succeeded

CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'zeroserv02' has completed

CRS-2677: Stop of 'ora.crsd' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.ctssd' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.evmd' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.storage' on 'zeroserv02'

CRS-2677: Stop of 'ora.storage' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.asm' on 'zeroserv02'

CRS-2677: Stop of 'ora.ctssd' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.evmd' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.asm' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'zeroserv02'

CRS-2677: Stop of 'ora.cssd' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.diskmon' on 'zeroserv02'

CRS-2677: Stop of 'ora.diskmon' on 'zeroserv02' succeeded

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.crf' on 'zeroserv02'

CRS-2673: Attempting to stop 'ora.gpnpd' on 'zeroserv02'

CRS-2677: Stop of 'ora.drivers.acfs' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.crf' on 'zeroserv02' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'zeroserv02'

CRS-2677: Stop of 'ora.mdnsd' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.gpnpd' on 'zeroserv02' succeeded

CRS-2677: Stop of 'ora.gipcd' on 'zeroserv02' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'zeroserv02' has completed

CRS-4133: Oracle High Availability Services has been stopped.

[root@zeroserv02 ~]#

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@zeroserv02 ~]#

[root@zeroserv02 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t -init

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cluster_interconnect.haip

      1        OFFLINE OFFLINE                               STABLE

ora.crf

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.crsd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cssd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.cssdmonitor

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.ctssd

      1        ONLINE  ONLINE       zeroserv02               ACTIVE:0,STABLE

ora.diskmon

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.drivers.acfs

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.drivers.oka

      1        OFFLINE OFFLINE                               STABLE

ora.evmd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.gipcd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.gpnpd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.mdnsd

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.storage

      1        ONLINE  ONLINE       zeroserv02               STABLE

--------------------------------------------------------------------------------

[root@zeroserv02 ~]#

8 – Check if everything is UP

After doing the changes in both nodes we can see if everything is up and running:

[root@zeroserv01 ~]# /u01/app/12.2.0.1/grid/bin/crsctl stat res -t

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ob_dbfs

               OFFLINE OFFLINE      zeroserv01               STABLE

               OFFLINE OFFLINE      zeroserv02               STABLE

ora.ASMNET1LSNR_ASM.lsnr

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.ASMNET2LSNR_ASM.lsnr

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.CATALOG.RAADMIN.advm

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.CATALOG.RAOSBADMIN.advm

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.CATALOG.dg

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.DELTA.dg

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.LISTENER.lsnr

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.catalog.raadmin.acfs

               ONLINE  ONLINE       zeroserv01               mounted on /raacfs/r

                                                             aadmin,STABLE

               ONLINE  ONLINE       zeroserv02               mounted on /raacfs/r

                                                             aadmin,STABLE

ora.catalog.raosbadmin.acfs

               ONLINE  ONLINE       zeroserv01               mounted on /osbcat,S

                                                             TABLE

               ONLINE  ONLINE       zeroserv02               mounted on /osbcat,S

                                                             TABLE

ora.chad

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.net1.network

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.ons

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

ora.proxy_advm

               ONLINE  ONLINE       zeroserv01               STABLE

               ONLINE  ONLINE       zeroserv02               STABLE

rep_dbfs

               OFFLINE OFFLINE      zeroserv01               STABLE

               OFFLINE OFFLINE      zeroserv02               STABLE

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.LISTENER_SCAN2.lsnr

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.LISTENER_SCAN3.lsnr

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.MGMTLSNR

      1        ONLINE  ONLINE       zeroserv01               192.168.10.1 192.168

                                                             .10.2,STABLE

ora.asm

      1        ONLINE  ONLINE       zeroserv01               Started,STABLE

      2        ONLINE  ONLINE       zeroserv02               Started,STABLE

ora.cvu

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.mgmtdb

      1        ONLINE  ONLINE       zeroserv01               Open,STABLE

ora.qosmserver

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.scan1.vip

      1        ONLINE  ONLINE       zeroserv02               STABLE

ora.scan2.vip

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.scan3.vip

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.zdlras.db

      1        OFFLINE OFFLINE                               STABLE

      2        OFFLINE OFFLINE                               Instance Shutdown,ST

                                                             ABLE

ora.zeroserv01.vip

      1        ONLINE  ONLINE       zeroserv01               STABLE

ora.zeroserv02.vip

      1        ONLINE  ONLINE       zeroserv02               STABLE

--------------------------------------------------------------------------------

[root@zeroserv01 ~]#

As you can see, everything is up. ASM is online in both nodes, as the listeners for ASM too.

9 – Start Databases

To finish, we need to restart all databases that were stopped at topic 1.

[oracle@zeroserv01 ~]$ srvctl start database -d zdlras

[oracle@zeroserv01 ~]$

Clean up

After you complete the GI upgrade for 19c, you can remove the ASM Proxy init file (from $GI_HOME/dbs) that was created before. It is not needed anymore because the upgrade removes completely the HAIP from CRS because it is not in use.

Conclusion

As told before, HAIP is not supported by the Exadata environment. If you check the definition for HAIP and the hardware design for the Exadata/Engineering System, it is completely redundant. The InfiniBand network and the dual-channel/ports have the same effect (even better)

But unfortunately, HAIP can be up for several reasons (like ZDLRA) and because of the way that ASM Proxy starts and pick up the interconnect IP, HAIP can be selected. Remove HAIP it is a little tricky because of the ASM dependency, we need to set some specific parameters.

As you saw before, there are several steps. But you can use and adapt them to your environment. If it is Exadata/Engineering System (like ZDLRA) they can be used almost as-is (just check the IPs). If it is not, read and adapt.

References:

1 2 3 … 7 »