Fast-Start Failover, Observe-Only Mode and Health Conditions
Oracle Data Guard Broker allows the database administrators to automate some tasks and an easy way to configure properly a lot of features and details for data guard environments. The Fast-Start FailOver (FSFO) allows the broker to automatically failover to standby database in case of failure of the primary. But until 19c the only option is always to trigger the failover. This changed at 19c with a nice new feature that allows us to put FSFO in Observe-Only Mode.
In this post, I will focus just on new features for FSFO like Observe-Only Mode and Health Conditions for it. Lag and other details will not be covered here.
Observe-Only Mode
The Observe-Only Mode is a simple change that allows putting the FSFO to just observing/monitoring the DG environment, but in case of failure, it does not change the roles between primary and standby. Simple like that. As the Broker documentation for Observe-Only Mode says:
The observe-only mode enables you to test the impact of using fast-start failover in your configuration, without making any actual changes to the configuration.
Mode details can be checked in this link at documentation too. But FSFO is that:
Enable Observe-Only
So, to enable it is very simple, just need to call “ENABLE FAST_START FAILOVER OBSERVE ONLY”:
DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;
Enabled in Observe-Only Mode.
DGMGRL>
And at drc* trace file at primary side we can see:
2020-06-11T23:45:19.329+02:00
ENABLE FAST_START FAILOVER OBSERVE ONLY
FSFO SetState(st=47 "ENABLE OBONLY", fl=0x0 "", ob=0x2b621d39, tgt=2, v=0)
Setup log_archive_dest_n of GROUP=0 PRIORITY=0 with 'golds19c' as FSFO target
Fast-Start Failover (FSFO) has been enabled under observe-only mode between:
Primary = "gold19c"
Standby = "golds19c"
2020-06-11T23:45:20.527+02:00
ENABLE FAST_START FAILOVER OBSERVE ONLY completed successfully
And the result it is FSFO at Observe-Only Mode
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Observe-Only Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 30 seconds
Active Target: golds19c
Potential Targets: "golds19c"
golds19c valid
Observer: goldsn1.oralocal
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL>
And after we force the shutdown of the database, we can see that the roles not changed.
[oracle@goldpn1 ~]$ srvctl stop database -d gold19c -o abort
[oracle@goldpn1 ~]$
At Observer log file we can see some information that the error with primary was detected but nothing is done since it is in observe mode:
…
Unable to connect to database using gold19c
[W000 2020-06-12T00:13:22.248+02:00] Primary database cannot be reached.
[W000 2020-06-12T00:13:22.248+02:00] Fast-Start Failover threshold has expired.
[W000 2020-06-12T00:13:22.248+02:00] Try to connect to the standby.
[W000 2020-06-12T00:13:22.248+02:00] Making a last connection attempt to primary database before proceeding with Fast-Start Failover.
[W000 2020-06-12T00:13:22.248+02:00] Check if the standby is ready for failover.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using gold19c
[W000 2020-06-12T00:13:22.261+02:00] A fast-start failover would have been initiated...
[W000 2020-06-12T00:13:22.261+02:00] Unable to failover since this observer is in observe-only mode
[W000 2020-06-12T00:13:22.261+02:00] Fast-Start Failover is not possible because observe-only mode.
[W000 2020-06-12T00:13:22.261+02:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using gold19c
[W000 2020-06-12T00:13:22.269+02:00] Primary database cannot be reached.
[W000 2020-06-12T00:13:22.269+02:00] Fast-Start Failover observe-only mode enabled.
[W000 2020-06-12T00:13:22.269+02:00] Will not attempt a Fast-Start Failover.
[W000 2020-06-12T00:13:22.269+02:00] Retry connecting to primary.
[W000 2020-06-12T00:13:23.270+02:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using gold19c
[W000 2020-06-12T00:13:23.277+02:00] Primary database cannot be reached.
[W000 2020-06-12T00:13:24.278+02:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
…
And at drc* trace file at standby side we can see:
2020-06-12T00:13:21.103+02:00
Fast-Start Failover cannot proceed because: "observe-only mode"
Until now, this means that error with primary was detected, logged at logs, but no action was taken. The roles continue the same. The show report confirm this too:
DGMGRL> show configuration verbose;
Configuration - gold19c
Protection Mode: MaxAvailability
Members:
gold19c - Primary database
golds19c - (*) Physical standby database
(*) Fast-Start Failover target
Properties:
FastStartFailoverThreshold = '30'
OperationTimeout = '30'
TraceLevel = 'USER'
FastStartFailoverLagLimit = '0'
CommunicationTimeout = '180'
ObserverReconnect = '0'
FastStartFailoverAutoReinstate = 'TRUE'
FastStartFailoverPmyShutdown = 'TRUE'
BystandersFollowRoleChange = 'ALL'
ObserverOverride = 'FALSE'
ExternalDestination1 = ''
ExternalDestination2 = ''
PrimaryLostWriteAction = 'CONTINUE'
ConfigurationWideServiceName = 'gold19c_CFG'
Fast-Start Failover: Enabled in Observe-Only Mode
Lag Limit: 0 seconds
Threshold: 30 seconds
Active Target: golds19c
Potential Targets: "golds19c"
golds19c valid
Observer: goldsn1.oralocal
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configuration Status:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
ORA-16625: cannot reach member "gold19c"
DGM-17017: unable to determine configuration status
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Observe-Only Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 30 seconds
Active Target: golds19c
Potential Targets: "golds19c"
golds19c valid
Observer: goldsn1.oralocal
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL>
In some scenarios, this can be good because it allows us to fix the problem and not pass from a failover event (manual reinstate and so on in some cases). Another option is to use the Observe-Only mode to do what the name says, just observer. Think in one environment that you want to test some conditions and the health of the environment (network and others) before you really enable the FSFO.
So, if the primary database returns, the FSFO returns normally:
[oracle@goldpn1 ~]$ srvctl start database -d gold19c
[oracle@goldpn1 ~]$
At drc* file for standby:
2020-06-12T00:16:52.837+02:00
Primary connected to this instance.
2020-06-12T00:17:00.186+02:00
FSFO SetState(st=2 "UNSYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=11)
2020-06-12T00:17:06.951+02:00
FSFO SetState(st=1 "SYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=12)
At Broker:
DGMGRL> show configuration;
Configuration - gold19c
Protection Mode: MaxAvailability
Members:
gold19c - Primary database
golds19c - (*) Physical standby database
Fast-Start Failover: Enabled in Observe-Only Mode
Configuration Status:
SUCCESS (status updated 51 seconds ago)
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Observe-Only Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 30 seconds
Active Target: golds19c
Potential Targets: "golds19c"
golds19c valid
Observer: goldsn1.oralocal
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL>
Upgrade and Downgrade modes
If the FSFO is operating in Observer-Only ode it is impossible to “upgrade” it to normal mode:
DGMGRL> ENABLE FAST_START FAILOVER
Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.
Failed.
DGMGRL>
To do that, we need to disable the FSFO and enable it in normal mode:
DGMGRL> ENABLE FAST_START FAILOVER
Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.
Failed.
DGMGRL>
DGMGRL> DISABLE FAST_START FAILOVER ;
Disabled.
DGMGRL> ENABLE FAST_START FAILOVER ;
Enabled in Zero Data Loss Mode.
DGMGRL>
To downgrade is the same, we can’t downgrade directly, need to disable and change to Observer-Only mode:
DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;
Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.
Failed.
DGMGRL>
Health Conditions
This is not a new feature for 19c, but help to reduce the scenarios where FSFO is triggered. It is possible to control the Health Conditions and disable/enable some errors like corrupted controlfiles or stuck archive. All options can be checked here at the documentation.
Look below at “Configurable Failover Conditions”, everything there can be set:
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Observe-Only Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 30 seconds
Active Target: golds19c
Potential Targets: "golds19c"
golds19c valid
Observer: goldsn1.oralocal
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL>
Some examples
DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";
Succeeded.
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Observe-Only Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 30 seconds
Active Target: golds19c
Potential Targets: "golds19c"
golds19c valid
Observer: goldsn1.oralocal
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile YES
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL>
DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Corrupted Dictionary";
Succeeded.
DGMGRL> DISABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";
Succeeded.
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Observe-Only Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 30 seconds
Active Target: golds19c
Potential Targets: "golds19c"
golds19c valid
Observer: goldsn1.oralocal
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL>
Another option is to enable (or disable) for special condition/error from controlfile. The error ORA-240 can be set at trigger option:
DGMGRL> ENABLE FAST_START FAILOVER CONDITION 240;
Succeeded.
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Observe-Only Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 30 seconds
Active Target: golds19c
Potential Targets: "golds19c"
golds19c valid
Observer: goldsn1.oralocal
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
ORA-240: control file enqueue held for more than %s seconds
DGMGRL> DISABLE FAST_START FAILOVER CONDITION 240;
Succeeded.
DGMGRL>
But just for ORA-240, other errors are not yet enabled:
DGMGRL> ENABLE FAST_START FAILOVER CONDITION 600;
Error: ORA-16524: unsupported command, option, or argument
Failed.
DGMGRL>
Observe-Only and Conditions
The new feature Observe-Only mode for 19c is a good feature because it allows more control where and when the FSFO is triggered. Until this, the only option was ON or OFF. And in scenarios that you want to test, or even validate the environment before enable (for real) was impossible.
And if we combine this with Heath Condition check, it is a powerful control for the DG environment. It allows a better tune.
Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community.”