Quickstart
When an Oracle Database becomes unavailable in a production (PRD) environment, troubleshooting should follow a structured and non-destructive approach to preserve evidence and avoid data loss.
1. Capture the Current State Before Any Restart
Before restarting Oracle services or rebooting the server, record the current state of the system and database components. This information may be lost after a restart.
- Identify running Oracle and cluster-related processes ps -ef | grep pmon
ps -ef | grep d.bin # Clusterware environments
systemctl status dbora.service - In high-availability setups, verify the Data Guard configuration dgmgrl sys/
SHOW CONFIGURATION;
SHOW DATABASE ; - In Clusterware environments, check the status of Oracle resources crsctl status resource -t
- For ASM-based storage, confirm disk group availability asmcmd lsdg
asmcmd lsdsk
2. Attempt Database Connectivity
Try connecting to the database using SQL*Plus to capture any error messages:
sqlplus / as sysdba
Record all error outputs for further analysis.
3. Review Oracle Alert Logs
Perform an initial analysis of the Oracle alert log to understand the nature of the failure.
- Check the most recent entries in: alert_.log
- Search backward from the end of the file for termination or Oracle errors: Instance terminated
ORA-
4. Check Operating System Logs
If the database instance terminated unexpectedly, inspect OS-level logs around the same timestamp to identify external causes such as resource exhaustion or storage failures.
- System logs: /var/log/messages
- Kernel messages: dmesg -T
5. Verify Backup Availability
Before attempting recovery or restart actions, confirm that valid database backups are available.
- Check RMAN backup status: RMAN> LIST BACKUP;
6. Restore or Start Services in a Controlled Manner
- Attempt to start the primary database and verify availability.
- If the primary database starts successfully, validate:
- Listener status
- Application connectivity
- In Data Guard environments, ensure redo log transport and apply services are functioning correctly on the standby database.
7. Perform Root Cause Analysis
Analyze collected logs, system outputs, and configuration states to determine the root cause of the database outage. This analysis should consider:
- Oracle internal errors
- OS-level resource constraints
- Storage or ASM issues
- Clusterware or Data Guard events
Analyzing Oracle Errors and Log Files
Effective troubleshooting of Oracle Database issues requires a systematic review of Oracle and operating system logs. Oracle often reports multiple errors for a single failure; therefore, identifying the first error in the sequence is critical, as it usually indicates the root cause.
1. Oracle Log Files to Review
General Oracle Logs
- Database Alert Log $ORACLE_BASE/diag/rdbms///trace/alert_.log The alert log provides high-level database events and typically references additional trace files that should also be reviewed.
- Listener Log $ORACLE_BASE/diag/tnslsnr//listener/trace/listener.log This log is useful for diagnosing connection-related issues and listener failures.
Note: Oracle may log multiple related errors. Always focus on the earliest ORA error reported to determine the underlying cause.
2. High Availability (Data Guard) Logs
- Data Guard Broker Log $ORACLE_BASE/diag/rdbms///trace/drc.log Review this log when issues involve role transitions, redo transport, or Data Guard configuration problems.
3. Clusterware Logs
- Oracle High Availability Services Daemon (OHASD) Log $GRID_HOME/log//ohasd
/var/log/oracleohasd - Clusterware Alert Log $ORACLE_BASE/diag/crs//crs/trace/alert.log
These logs are essential for identifying Clusterware startup failures, resource fencing, or node-level issues.
4. ASM Logs
- ASM Alert Log $ORACLE_BASE/diag/asm/+asm//trace/alert_.log
- Oracle ASM System Logs /var/log/oracleasm
Review these logs when database issues may be related to storage availability, disk groups, or ASM instance failures.
5. Correlate with Operating System Logs
If Oracle processes terminated unexpectedly (for example, due to memory exhaustion or I/O failures), correlate Oracle logs with OS-level diagnostics:
- System logs: /var/log/messages
- Kernel messages: dmesg -T
- System monitoring and resource checks: free -m
top
df -h
mount - Historical monitoring data (e.g., Munin graphs), if available
6. Oracle Error Code Reference
Oracle maintains official documentation for error codes, including possible causes and recommended actions:
Oracle Error Messages Documentation:
https://docs.oracle.com/en/database/oracle/oracle-database/19/errmg/index.html
Examples:
| log entry | Error Message Documentation |
| alert log: 2022-02-01T16:42:19.462373+00:00 PMON (ospid: 12382): terminating the instance due to ORA error 471 | ORA-00471: DBWR process terminated with error Cause: The database writer process died Action: Warm start instance |
| alert log: 2022-01-25T15:57:29.050729+00:00 ORA-00020: maximum number of processes (300) exceeded ORA-20 errors will not be written to the alert log for … | ORA-00020: maximum number of processes (string) exceeded Cause: All process state objects are in use. Action: Increase the value of the PROCESSES initialization parameter. |
| listener log: 25-JAN-2022 16:38:04 * (CONNECT_DATA=(SERVICE_NAME=ZONOS.NPT.COM)… TNS-12528: TNS:listener: all appropriate instances are blocking new connections | TNS-12528: TNS:listener: all appropriate instances are blocking new connections Cause: All instances supporting the service requested by the client reported that they were blocking the new connections. This condition may be temporary, such as at instance startup. Action: Attempt the connection again. If error persists, then contact the administrator to check the status of the instances. |
Oracle Knowledge Base
Requires Oracle Support Login
https://support.oracle.com/, go to Knowledge tab, select Oracle product and enter patterns to search for:
Interactive Quick Reference
Oracle provides a quick Ref for 12c (not available for 19c but should be more or less the same) which might be helpful when trying to find
- appropriate DBA/Performance views to analyse an issue,
- short description of Oracle background processes, …
- https://www.oracle.com/webfolder/technetwork/tutorials/obe/db/12c/r1/poster/OUTPUT_poster/poster.html#
DB/Cluster Status (Is Oracle up and accepts connections?)
Check systemd status:
systemctl status dbora.service (standalone)
systemctl status oracle-ohasd.service (clusterware)
Check if pmon (Oracle Process Monitor, checks all other Oracle background processes) is running:
$ ps -ef | grep pmon
oracle 5502 1 0 Jan25 ? 00:01:47 ora_pmon_ACPDB
Clusterware (grid layer):
$ ps -ef | grep pmon
grid 82263 1 0 Jan05 ? 00:03:48 asm_pmon_+ASM
oracle 82362 1 0 Jan05 ? 00:06:37 ora_pmon_DB2
Try to connect & check DB state and open-mode:
$ sqlplus /nolog => if that fails $ORACLE_HOME may be unavailable, e.g. due to disk lost/readonly mounted, ...
SQL> conn sys as sysdba
Enter password:
Connected.
or:
$ sqlplus / as sysdba
SQL> select user from dual;
USER
--------------------------------------------------------------------------------
SYS
SQL> SELECT INSTANCE_NAME, STATUS, DATABASE_STATUS FROM V$INSTANCE;
INSTANCE_NAME STATUS DATABASE_STATUS
---------------- ------------ -----------------
ACPDB OPEN ACTIVE
SQL> select name, open_mode from v$database;
NAME OPEN_MODE
--------- --------------------
ACPDB READ WRITE
Clusterware only: check cluster status. “Target” state should match “State” in output!
crsctl status resource -t
Example:
[grid@myhost_db1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DGDATA.dg
ONLINE ONLINE myhost_db1 STABLE
ora.DGFRA.dg
ONLINE ONLINE myhost_db1 STABLE
ora.DGREDO.dg
ONLINE ONLINE myhost_db1 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE myhost_db1 STABLE
ora.asm
ONLINE ONLINE myhost_db1 Started,STABLE
ora.ons
OFFLINE OFFLINE myhost_db1 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 ONLINE ONLINE myhost_db1 STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.mydbr.db
1 ONLINE ONLINE myhost_db1 Open,Readonly,HOME=/
u01/app/oracle/produ
ct/19.0.0/mydb,STAB
LE
ora.evmd
1 ONLINE ONLINE myhost_db1 STABLE
--------------------------------------------------------------------------------
Check Oracle alert log for records which indicate that Oracle was terminated and not restarted (e.g. search from last line for ‘Instance terminated’ and check records before and after)
Check listener status:
$ lsnrctl status
Clean shutdown & restart all Oracle/grid infrastructure processes
Clusterware
Stop/start grid + Oracle High Availability Service (as user grid):
$ crsctl stop res -all
$ crsctl stop has
$ crsctl start has
$ crsctl start res -all
Stop/start database/cluster services using srvctl
When starting database using sqplus shutdown command, clusterware might not get current status of database. Use srvctl commands to stop/start (as user oracle!):
Examples:
Status (stop/start) ora.itsdbtst2.db database service:
[oracle@it1dba2 ~]$ srvctl status database -d itsdbtst2
Database is running.
Status (stop/start) ora.itsdbtst2.itsdbtst.cucunet.svc service (for failover zonos applications, expected to be ONLINE on primary and OFFLINE on standby:
[oracle@it1dba2 ~]$ srvctl status service -d itsdbtst2 -s itsdbtst.cucunet
Service itsdbtst.cucunet is not running.
Issue: Services connecting to read-only standby DB
Examples:
Jan 23 17:37:09 it1coa1 zonos-process-engine_10121[1206]: SQL Error: 16000, SQLState: 72000
Jan 23 17:37:09 it1coa1 zonos-process-engine_10121[1206]: ORA-16000: database or pluggable database open for read-only access
Jan 23 17:41:25 it1apa1 zonos-control-panel_10121[92800]: LEFT JOIN DEVICE_TEMPLATE ON DEVICE_TEMPLATE.ID = FILTERED_DEVICE.DEVICE_TEMPLATE_ID]; SQL state [72000]; error code [16000]; ORA-16000: database or pluggable database open for read-only access
Jan 23 17:41:25 it1apa1 zonos-control-panel_10121[92800]: ; nested exception is java.sql.SQLException: ORA-16000: database or pluggable database open for read-only access
Check current status of custom ora.service cluster service (ora.*.svc) as user grid.
The expected current state is OFFLINE on standby. During failover this service is relocated (started on new primary, stopped on old primary=new standby).
$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
...
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 ONLINE ONLINE it1dba1 STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.evmd
1 ONLINE ONLINE it1dba1 STABLE
ora.itsdbtst.db
1 ONLINE ONLINE it1dba1 Open,Readonly,HOME=/
u01/app/oracle/produ
ct/19.0.0/dbhome_1,S
TABLE
ora.itsdbtst.itsdbtst.cucunet.svc
1 ONLINE OFFLINE STABLE
--------------------------------------------------------------------------------
$ crsctl stat res -v
...
NAME=ora.itsdbtst2.itsdbtst.cucunet.svc
TYPE=ora.service.type
LAST_SERVER=it1dba2
STATE=OFFLINE
TARGET=...
Stop service manually:
$ srvctl status service -d itsdbtst -s itsdbtst.cucunet
Service itsdbtst.cucunet is running
$ srvctl stop service -d itsdbtst -s itsdbtst.cucunet
Related logs: $ORACLE_BASE/diag/crs/…/crs/trace/…
RAC
OHAS RAC: crsctl enable crs => Cluster manager (cluster ready service) needs to be enabled in RAC
As user grid:
$ crsctl stop crs (stop the entire CRS (GI) stack on each node)
$ crsctl stop cluster ("-all" to stop crs on all nodes)
$ crsctl start crs
$ crsctl start cluster ("-all" to start crs on all nodes)
Start terminated Oracle primary instance step by step
to track at which point Oracle fails and returns error
SQL> shutdown immediate; (rollback TX, close+shutdown. Last resort: "shutdown abort" => uncommitted TX are killed not rolled back, re-run during Crash Recovery on restart)
SQL> startup nomount; (allocate SGA, read init file, start background processes)
SQL> ALTER DATABASE MOUNT; (open controlfile, find redo logs + datafiles)
SQL> ALTER DATABASE OPEN; (open datafiles, redo logs)
check DB state and open-mode:
SQL> SELECT instance_name, status FROM v$instance;
INSTANCE_NAME STATUS
---------------- ------------
ACPDB OPEN
Start standby database(s) and check/start redo apply
Startup:
SQL> STARTUP MOUNT;
SQL> ALTER DATABASE OPEN READ ONLY;
Database altered.
Start Redo apply (MRP process):
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
Check recovery mode, MRP process and applied archives on Standby:
SQL> select RECOVERY_MODE from V$ARCHIVE_DEST_STATUS ;
RECOVERY_MODE
----------------------------------
MANAGED REAL TIME APPLY
...
SQL> select process,status,sequence#, group#, thread# from v$managed_standby;
"PROCESS","STATUS","SEQUENCE#","GROUP#","THREAD#"
...
"MRP0","APPLYING_LOG",8744,"N/A",1
SQL> select sequence#, first_time, next_time, STATUS from v$archived_log order by first_time desc,sequence# FETCH FIRST 5 ROWS ONLY;
"SEQUENCE#","FIRST_TIME","NEXT_TIME","STATUS"
8743,"2022-01-26 12:40:00","2022-01-26 14:15:06","A"
Check if Primary is in sync with Standby:
see Verify log shipping from Primary to Standby
ASM
Commands to check availability of ASM disks:
oracleasm listdisks:
[grid@myhost_db1 ~]$ oracleasm listdisks
SDB1
SDB2
SDD1
asmcmd lsdsk/lsdg (list disks/diskgroups):
[grid@myhost_db1 ~]$ asmcmd lsdsk
Path
/dev/oracleasm/disks/SDB1
/dev/oracleasm/disks/SDB2
/dev/oracleasm/disks/SDD1
[grid@myhost_db1 ~]$ asmcmd lsdg
State Type Rebal Sector Logical_Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 512 4096 1048576 962560 954575 0 954575 0 N DGDATA/
MOUNTED EXTERN N 512 512 4096 1048576 295729 283793 0 283793 0 N DGFRA/
MOUNTED EXTERN N 512 512 4096 1048576 30718 30666 0 30666 0 N DGREDO/
oracleasm querydisk command (Oracle ASMLib driver) lists major and minor numbers that can be used to match the physical device:
[grid@myhost_db1 ~]$ oracleasm querydisk -d SDD1
Disk "SDD1" is a valid ASM disk on device [8,65]
[grid@myhost_db1 ~]$ oracleasm querydisk -d SDB2
Disk "SDB2" is a valid ASM disk on device [8,18]
[grid@myhost_db1 ~]$ oracleasm querydisk -d SDB1
Disk "SDB1" is a valid ASM disk on device [8,17]
[grid@myhost_db1 ~]$ ls -ltr /dev/sd*
brw-rw---- 1 root disk 8, 64 Dec 23 21:01 /dev/sde
brw-rw---- 1 root disk 8, 65 Dec 23 21:01 /dev/sde1
brw-rw---- 1 root disk 8, 16 Jan 3 21:28 /dev/sdb
brw-rw---- 1 root disk 8, 17 Jan 3 21:28 /dev/sdb1
brw-rw---- 1 root disk 8, 18 Jan 3 21:28 /dev/sdb2
...
SDB1 => /dev/sdb1
SDB2 => /dev/sdb2
SDD1 => /dev/sde1
alt. to identify ASM devices on major:minor:
# ls -ltr /dev/oracleasm/disks/
# cat /proc/partitions
# lsblk
Check which devices are identified by ASMLib as ASM dev:
# blkid |grep asm
Query ASM tables in database:
SQL> SELECT * FROM V$ASM_DISKGROUP;
"GROUP_NUMBER","NAME","SECTOR_SIZE","LOGICAL_SECTOR_SIZE","BLOCK_SIZE","ALLOCATION_UNIT_SIZE","STATE","TYPE","TOTAL_MB","FREE_MB","HOT_USED_MB","COLD_USED_MB","REQUIRED_MIRROR_FREE_MB","USABLE_FILE_MB","OFFLINE_DISKS","COMPATIBILITY","DATABASE_COMPATIBILITY","VOTING_FILES","CON_ID"
0,"DGREDO",0,0,0,0,"DISMOUNTED",,0,0,0,0,0,0,0,"0.0.0.0.0","0.0.0.0.0","N",0
1,"DGDATA",512,512,4096,1048576,"CONNECTED","EXTERN",962560,954585,0,7975,0,954585,0,"12.1.0.0.0","10.1.0.0.0","N",0
2,"DGFRA",512,512,4096,1048576,"CONNECTED","EXTERN",295729,283736,0,11993,0,283736,0,"11.2.0.2.0","10.1.0.0.0","N",0
SQL> select GROUP_NUMBER, DISK_NUMBER, NAME, LABEL, PATH, CREATE_DATE, MOUNT_DATE, MOUNT_STATUS, STATE, HEADER_STATUS, MODE_STATUS, OS_MB,TOTAL_MB from V$ASM_DISK;
"GROUP_NUMBER","DISK_NUMBER","NAME","LABEL","PATH","CREATE_DATE","MOUNT_DATE","MOUNT_STATUS","STATE","HEADER_STATUS","MODE_STATUS","OS_MB","TOTAL_MB"
3,0,"DGREDO_0000","SDD1","/dev/oracleasm/disks/SDD1","24-APR-19","03-JAN-22","CACHED","NORMAL","MEMBER","ONLINE",30718,30718
2,0,"DGFRA_0000","SDB2","/dev/oracleasm/disks/SDB2","24-APR-19","23-DEC-21","CACHED","NORMAL","MEMBER","ONLINE",295729,295729
1,0,"DGDATA_0000","SDB1","/dev/oracleasm/disks/SDB1","24-APR-19","23-DEC-21","CACHED","NORMAL","MEMBER","ONLINE",962560,962560
show ASM config:
$ oracleasm configure
DataGuard
Summary + Status
DGMGRL> SHOW CONFIGURATION [LAG] [VERBOSE];
DGMGRL> SHOW DATABASE [VERBOSE] <dbname>
DGMGRL> VALIDATE DATABASE [VERBOSE] <dbname> (e.g. before performing switchover)
Transport and Apply Lag Information
DGMGRL> show configuration lag
Configuration - acpdb_dg_config
Protection Mode: MaxPerformance
Members:
mydba1 - Primary database
mydba2 - (*) Physical standby database
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Fast-Start Failover: Enabled in Potential Data Loss Mode
Configuration Status:
SUCCESS (status updated 51 seconds ago)
Information about all registered observers in DG broker
DGMGRL> show observer
Configuration - acpdb_dg_config
Primary: mydba1
Active Target: mydba2
Observer "mykaa3"(19.5.0.0.0) - Master
Host Name: mykaa3
Last Ping to Primary: 1 second ago
Last Ping to Target: 4 seconds ago
Stop/start observer
On observer host as user oracle:
[oracle@mykaa3 ~]$ cd /home/oracle/scripts/
[oracle@mykaa3 scripts]$ ./observer.sh start|stop|restart
Observer issue: Observer fails to start due to additionally created observers
(by “start observer” command in DataGuard Manager on DB host)
Example:
## 2 additional observer ("myhost_db1" + "myhost_db1") running. "myhost_adm1" fails to start.
DGMGRL> show observer
Configuration - dst_dg_config
Primary: mydb
Active Target: mydb2
Observer "myhost_db1"(19.5.0.0.0) - Master
Host Name: myhost_db1
Last Ping to Primary: 5346 seconds ago
Last Ping to Target: 4467 seconds ago
Observer "myhost_adm1"(19.3.0.0.0) - Backup
Host Name: myhost_adm1
Last Ping to Primary: (unknown)
Last Ping to Target: (unknown)
Observer "myhost_db1" - Backup
Host Name: myhost_db1
Last Ping to Primary: (unknown)
Last Ping to Target: (unknown)
## trying to get rid of manually created servers
DGMGRL> stop observer "myhost_db1"
Error: ORA-16878: master observer cannot be stopped
Failed.
DGMGRL> stop observer "myhost_db1"
Observer stopped.
DGMGRL> SET MASTEROBSERVER TO "myhost_adm1"
Succeeded.
DGMGRL> show observer
Configuration - dst_dg_config
Primary: mydb
Active Target: mydb2
Observer "myhost_adm1"(19.3.0.0.0) - Master
Host Name: myhost_adm1
Last Ping to Primary: (unknown)
Last Ping to Target: (unknown)
Observer "myhost_db1"(19.5.0.0.0) - Backup
Host Name: myhost_db1
Last Ping to Primary: 5581 seconds ago
Last Ping to Target: 3 seconds ago
DGMGRL> stop observer "myhost_db1"
Observer stopped.
DGMGRL> show observer
Configuration - dst_dg_config
Primary: mydb
Active Target: mydb2
Observer "myhost_adm1"(19.3.0.0.0) - Master
Host Name: myhost_adm1
Last Ping to Primary: (unknown)
Last Ping to Target: (unknown)
## Start original observer on adm host
# su - oracle
$ cd scripts/
$ ./observer.sh start
$ tail ../log/observer.log
...
Observer 'myhost_adm1' started
[W000 2022-06-21T16:57:33.131+08:00] Observer trace level is set to USER
[W000 2022-06-21T16:57:33.131+08:00] Try to connect to the primary.
[W000 2022-06-21T16:57:33.131+08:00] Try to connect to the primary mydb_dgb.
[W000 2022-06-21T16:57:33.147+08:00] The standby mydb2 is ready to be a FSFO target
[W000 2022-06-21T16:57:34.147+08:00] Connection to the primary restored!
[W000 2022-06-21T16:57:38.157+08:00] Disconnecting from database mydb_dgb.
## Check observer state on DB hosts
DGMGRL> show observer
Configuration - dst_dg_config
Primary: mydb
Active Target: mydb2
Observer "myhost_adm1"(19.3.0.0.0) - Master
Host Name: myhost_adm1
Last Ping to Primary: 2 seconds ago
Last Ping to Target: 1 second ago
FSFO config
DGMGRL> SHOW FAST_START FAILOVER
Fast-Start Failover: Enabled in Potential Data Loss Mode
Protection Mode: MaxPerformance
Lag Limit: 30 seconds
Threshold: 30 seconds
Active Target: mydba2
Potential Targets: "mydba2"
mydba2 valid
Observer: mykaa3
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL> DISABLE FAST_START FAILOVER ## e.g. to failover manually to DR system
DGMGRL> ENABLE FAST_START FAILOVER ## e.g. to switch back to PRD
Is Oracle supposed to switch to a standby DB if primary DB fails? Why did that fail?
Check cluster cfg in DataGuard (as user oracle: ‘dgmgrl sys/<passwd>’) for
- ‘show configuration’ output shows a asterisk before the standby db entry, and FastStart Failover settings
- ‘show database verbose primary_dbname’ reports a failover target database in FastStartFailoverTarget property
Example 1: Failover enabled
Oracle is supposed to check state and switch to the target db defined in FastStartFailoverTarget in case of primary failure
DGMGRL> show configuration
...
Members:
mydba1 - Primary database
mydba2 - (*) Physical standby database
Fast-Start Failover: Enabled in Potential Data Loss Mode
DGMGRL> show database verbose mydba1
...
Properties:
...
FastStartFailoverTarget = 'mydba2'
Example 2: Failover disabled
StandbyDB is just a DR db. If primary fails a manual failover needs to be done.
DGMGRL> show configuration;
...
Members:
cl1db1 - Primary database
cl2db1 - Physical standby database
Fast-Start Failover: Disabled
DGMGRL> show database verbose cl1db1;
...
Properties:
...
FastStartFailoverTarget = ''
Switchover & Failover
Switchover => role reversal between the primary database and one of its standby databases – primary db transitions to standby role, standby db transitions to primary
- Guarantees no data loss
- Switchover requires the primary db to be available!
Switchover for scheduled (quick) maintenance (OS/HW updates, …) :
see Switch Oracle DB for maintenance
When planning to switch the db for longer period:
Disable Fast Start Failover before switching the db (to prevent the observer from initiating a failover back to original primary DB):
DGMGRL> show FAST_START FAILOVER;
DGMCRL> disable FAST_START FAILOVER;
Config change is propagated to primary and all standby database present in DataGuard cfg. If network connection is broken try ‘disable fast_start failover force’.
Failover => role transition in which one of the standby databases is transitioned to the primary role after the primary db crashed (standby database is opened read/write).
Failover is usually initiated by Observer/DataGuard in case of emergency (see FastStartFailoverTarget property) but can be also initiated manually.
- Failover to standby cannot be reversed, so the decision to failover manually should be made during a real disaster or severe outage
- May or may not result in data loss depending on the protection mode in effect at the time of the failover
- Failover does not care about state and availability of primary db, it simply transform a pervious standby to primary db
- As long as the original primary database can be mounted and its files are intact, it can be quickly reinstated and resynchronized as a standby database using Flashback
Failover when the primary database is unavailable/corrupted/… :
Network connectivity check
DGMGRL> VALIDATE NETWORK CONFIGURATION FOR all;
Connecting to instance "ACPDB" on database "mydba1" ...
Connected to "mydba1"
Checking connectivity from instance "ACPDB" on database "mydba1 to instance "ACPDB" on database "mydba2"...
Succeeded.
Connecting to instance "ACPDB" on database "mydba2" ...
Connected to "mydba2"
Checking connectivity from instance "ACPDB" on database "mydba2 to instance "ACPDB" on database "mydba1"...
Succeeded.
Misc
query v$resource_limit to check if database reached some configuration limit:
SQL> col RESOURCE_NAME format A30;
SQL> col INITIAL_ALLOCATION format A20;
SQL> col LIMIT_VALUE format A20;
SQL> set linesize 200;
SQL> select * from v$resource_limit; (select * from gv$resource_limit => in RAC: for all instances)
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE CON_ID
------------------------------ ------------------- --------------- -------------------- -------------------- ----------
processes 76 80 400 400 0
sessions 92 97 624 624 0
enqueue_locks 39 113 7420 7420 0
enqueue_resources 30 30 2956 UNLIMITED 0

