Fast-Start Failover, Observe-Only Mode and Health Conditions

Oracle Data Guard Broker allows the database administrators to automate some tasks and an easy way to configure properly a lot of features and details for data guard environments. The Fast-Start FailOver (FSFO) allows the broker to automatically failover to standby database in case of failure of the primary. But until 19c the only option is always to trigger the failover. This changed at 19c with a nice new feature that allows us to put FSFO in Observe-Only Mode.
In this post, I will focus just on new features for FSFO like Observe-Only Mode and Health Conditions for it. Lag and other details will not be covered here.

 

Observe-Only Mode

The Observe-Only Mode is a simple change that allows putting the FSFO to just observing/monitoring the DG environment, but in case of failure, it does not change the roles between primary and standby. Simple like that. As the Broker documentation for Observe-Only Mode says:
The observe-only mode enables you to test the impact of using fast-start failover in your configuration, without making any actual changes to the configuration.
Mode details can be checked in this link at documentation too. But FSFO is that:

 

 

Enable Observe-Only

So, to enable it is very simple, just need to call “ENABLE FAST_START FAILOVER OBSERVE ONLY”:

 

DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;

Enabled in Observe-Only Mode.

DGMGRL>

 

And at drc* trace file at primary side we can see:

 

2020-06-11T23:45:19.329+02:00

ENABLE FAST_START FAILOVER OBSERVE ONLY

FSFO SetState(st=47 "ENABLE OBONLY", fl=0x0 "", ob=0x2b621d39, tgt=2, v=0)

Setup log_archive_dest_n of GROUP=0 PRIORITY=0 with 'golds19c' as FSFO target

Fast-Start Failover (FSFO) has been enabled under observe-only mode between:

  Primary = "gold19c"

  Standby = "golds19c"

2020-06-11T23:45:20.527+02:00

ENABLE FAST_START FAILOVER OBSERVE ONLY completed successfully

 

And the result it is FSFO at Observe-Only Mode

 

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

 

And after we force the shutdown of the database, we can see that the roles not changed.

 

[oracle@goldpn1 ~]$ srvctl stop database -d gold19c -o abort

[oracle@goldpn1 ~]$

 

At Observer log file we can see some information that the error with primary was detected but nothing is done since it is in observe mode:

 



Unable to connect to database using gold19c

[W000 2020-06-12T00:13:22.248+02:00] Primary database cannot be reached.

[W000 2020-06-12T00:13:22.248+02:00] Fast-Start Failover threshold has expired.

[W000 2020-06-12T00:13:22.248+02:00] Try to connect to the standby.

[W000 2020-06-12T00:13:22.248+02:00] Making a last connection attempt to primary database before proceeding with Fast-Start Failover.

[W000 2020-06-12T00:13:22.248+02:00] Check if the standby is ready for failover.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor




Unable to connect to database using gold19c

[W000 2020-06-12T00:13:22.261+02:00] A fast-start failover would have been initiated...

[W000 2020-06-12T00:13:22.261+02:00] Unable to failover since this observer is in observe-only mode

[W000 2020-06-12T00:13:22.261+02:00] Fast-Start Failover is not possible because observe-only mode.

[W000 2020-06-12T00:13:22.261+02:00] Try to connect to the primary.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor




Unable to connect to database using gold19c

[W000 2020-06-12T00:13:22.269+02:00] Primary database cannot be reached.

[W000 2020-06-12T00:13:22.269+02:00] Fast-Start Failover observe-only mode enabled.

[W000 2020-06-12T00:13:22.269+02:00] Will not attempt a Fast-Start Failover.

[W000 2020-06-12T00:13:22.269+02:00] Retry connecting to primary.

[W000 2020-06-12T00:13:23.270+02:00] Try to connect to the primary.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor




Unable to connect to database using gold19c

[W000 2020-06-12T00:13:23.277+02:00] Primary database cannot be reached.

[W000 2020-06-12T00:13:24.278+02:00] Try to connect to the primary.

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

 

And at drc* trace file at standby side we can see:

 

2020-06-12T00:13:21.103+02:00

Fast-Start Failover cannot proceed because: "observe-only mode"

 

Until now, this means that error with primary was detected, logged at logs, but no action was taken. The roles continue the same. The show report confirm this too:

 

DGMGRL> show configuration verbose;




Configuration - gold19c




  Protection Mode: MaxAvailability

  Members:

  gold19c  - Primary database

    golds19c - (*) Physical standby database




  (*) Fast-Start Failover target




  Properties:

    FastStartFailoverThreshold      = '30'

    OperationTimeout                = '30'

    TraceLevel                      = 'USER'

    FastStartFailoverLagLimit       = '0'

    CommunicationTimeout            = '180'

    ObserverReconnect               = '0'

    FastStartFailoverAutoReinstate  = 'TRUE'

    FastStartFailoverPmyShutdown    = 'TRUE'

    BystandersFollowRoleChange      = 'ALL'

    ObserverOverride                = 'FALSE'

    ExternalDestination1            = ''

    ExternalDestination2            = ''

    PrimaryLostWriteAction          = 'CONTINUE'

    ConfigurationWideServiceName    = 'gold19c_CFG'




Fast-Start Failover: Enabled in Observe-Only Mode

  Lag Limit:          0 seconds

  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configuration Status:

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

ORA-16625: cannot reach member "gold19c"

DGM-17017: unable to determine configuration status




DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

 

In some scenarios, this can be good because it allows us to fix the problem and not pass from a failover event (manual reinstate and so on in some cases). Another option is to use the Observe-Only mode to do what the name says, just observer. Think in one environment that you want to test some conditions and the health of the environment (network and others) before you really enable the FSFO.
 
So, if the primary database returns, the FSFO returns normally:

 

[oracle@goldpn1 ~]$ srvctl start database -d gold19c

[oracle@goldpn1 ~]$

 

At drc* file for standby:

 

2020-06-12T00:16:52.837+02:00

Primary connected to this instance.

2020-06-12T00:17:00.186+02:00

FSFO SetState(st=2 "UNSYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=11)

2020-06-12T00:17:06.951+02:00

FSFO SetState(st=1 "SYNC", fl=0x1 "AVAIL", ob=0x0, tgt=2, v=12)

 

At Broker:

 

DGMGRL> show configuration;




Configuration - gold19c




  Protection Mode: MaxAvailability

  Members:

  gold19c  - Primary database

    golds19c - (*) Physical standby database




Fast-Start Failover: Enabled in Observe-Only Mode




Configuration Status:

SUCCESS   (status updated 51 seconds ago)




DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

 

Upgrade and Downgrade modes

If the FSFO is operating  in Observer-Only ode it is impossible to “upgrade” it to normal mode:

 

DGMGRL>  ENABLE FAST_START FAILOVER

Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.




Failed.

DGMGRL>

 

To do that, we need to disable the FSFO and enable it in normal mode:

 

DGMGRL>  ENABLE FAST_START FAILOVER

Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.




Failed.

DGMGRL>

DGMGRL> DISABLE FAST_START FAILOVER ;

Disabled.

DGMGRL> ENABLE FAST_START FAILOVER ;

Enabled in Zero Data Loss Mode.

DGMGRL>

 

To downgrade is the same, we can’t downgrade directly, need to disable and change to Observer-Only mode:

 

DGMGRL> ENABLE FAST_START FAILOVER OBSERVE ONLY;

Error: ORA-16889: Fast-start failover mode cannot be changed between normal and observe-only modes.




Failed.

DGMGRL>

 

Health Conditions

This is not a new feature for 19c, but help to reduce the scenarios where FSFO is triggered. It is possible to control the Health Conditions and disable/enable some errors like corrupted controlfiles or stuck archive. All options can be checked here at the documentation.
Look below at “Configurable Failover Conditions”, everything there can be set:

 

DGMGRL>  show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

 

Some examples

 

DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";

Succeeded.

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile           YES

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>




DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Corrupted Dictionary";

Succeeded.

DGMGRL> DISABLE FAST_START FAILOVER CONDITION "Inaccessible Logfile";

Succeeded.

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    (none)




DGMGRL>

 

Another option is to enable (or disable) for special condition/error from controlfile. The error ORA-240 can be set at trigger option:

 

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 240;

Succeeded.

DGMGRL> show fast_start failover;




Fast-Start Failover: Enabled in Observe-Only Mode




  Protection Mode:    MaxAvailability

  Lag Limit:          0 seconds




  Threshold:          30 seconds

  Active Target:      golds19c

  Potential Targets:  "golds19c"

    golds19c   valid

  Observer:           goldsn1.oralocal

  Shutdown Primary:   TRUE

  Auto-reinstate:     TRUE

  Observer Reconnect: (none)

  Observer Override:  FALSE




Configurable Failover Conditions

  Health Conditions:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Write Errors          YES




  Oracle Error Conditions:

    ORA-240: control file enqueue held for more than %s seconds




DGMGRL> DISABLE FAST_START FAILOVER CONDITION 240;

Succeeded.

DGMGRL>

 

But just for ORA-240, other errors are not yet enabled:

 

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 600;

Error: ORA-16524: unsupported command, option, or argument




Failed.

DGMGRL>

 

Observe-Only and Conditions

The new feature Observe-Only mode for 19c is a good feature because it allows more control where and when the FSFO is triggered. Until this, the only option was ON or OFF. And in scenarios that you want to test, or even validate the environment before enable (for real) was impossible.
And if we combine this with Heath Condition check, it is a powerful control for the DG environment. It allows a better tune.
 

Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community.