A technical troubleshooting blog about Oracle with other Databases & Cloud Technologies.

Oracle Clusterware Startup Sequence

7 min read
What is Oracle Clusterware – RAC ?

Oracle Clusterware enables servers to communicate with each other, so that they appear to function as a collective unit. This combination of servers is commonly known as a "cluster". 

Oracle Real Application Clusters known as Oracle RAC uses Oracle Clusterware as the infrastructure that binds multiple nodes that then operate as a single server. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components such as instances and listeners. If a failure occurs, then Oracle Clusterware automatically attempts to restart the failed component and also redirects operations to a surviving component.
The benefits of using a cluster include:

1. Scalability of applications (including Oracle RAC and Oracle RAC One databases)

2. Reduce total cost of ownership for the infrastructure by providing a scalable system with low-cost commodity hardware

3. Ability to fail over

4. Increase throughput on demand for cluster-aware applications, by adding servers to a cluster to increase cluster resources

5. Increase throughput for cluster-aware applications by enabling the applications to run on all of the nodes in a cluster

6. Ability to program the startup of applications in a planned order that ensures dependent processes are started in the correct sequence

7. Ability to monitor processes and restart them if they stop

8. Eliminate unplanned downtime due to hardware or software malfunctions
This command will display the status of all cluster resources:
$ ./crsctl status resource -t
List of Processes and Services Associated with Oracle Clusterware Components
Oracle Clusterware ComponentLinux/UNIX ProcessWindows Processes
CRScrsd.bin (r)crsd.exe
CSSocssd.bincssdmonitorcssdagentcssdagent.execssdmonitor.exe ocssd.exe
CTSSoctssd.bin (r)octssd.exe
GNSgnsd (r)gnsd.exe
Grid Plug and Playgpnpd.bingpnpd.exe
LOGGERologgerd.bin (r)ologgerd.exe
Master Diskmondiskmon.bin
Oracle agentoraagent.bin (Oracle Clusterware 12c release 1 (12.1) and later releasesoraagent.exe
Oracle High Availability Servicesohasd.bin (r)ohasd.exe
Oracle root agentorarootagent (r)orarootagent.exe
SYSMONosysmond.bin (r)osysmond.exe
The Oracle High Availability Services Technology Stack
The following list describes the processes in the Oracle High Availability Services technology stack:

appagent: Protects any resources of the application resource type used in previous versions of Oracle Clusterware.

Cluster Logger Service (ologgerd): Receives information from all the nodes in the cluster and persists in an Oracle Grid Infrastructure Management Repository-based database. This service runs on only two nodes in a cluster.

Grid Interprocess Communication (GIPC): A support daemon that enables Redundant Interconnect Usage.

Grid Plug and Play (GPNPD): Provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent profile.

Multicast Domain Name Service (mDNS): Used by Grid Plug and Play to locate profiles in the cluster, and by GNS to perform name resolution. The mDNS process is a background process on Linux and UNIX and on Windows.

Oracle Agent (oraagent): Extends clusterware to support Oracle-specific requirements and complex resources. This process manages daemons that run as the Oracle Clusterware owner, like the GIPC, GPNPD, and GIPC daemons.

Oracle Root Agent (orarootagent): A specialized oraagent process that helps the CRSD manage resources owned by root, such as the Cluster Health Monitor (CHM).

System Monitor Service (osysmond): The monitoring and operating system metric collection service that sends the data to the cluster logger service. This service runs on every node in a cluster.

The Oracle Clusterware Technology Stack

The Cluster Ready Services (CRS) technology stack leverages several processes to manage various services.

The following list describes these processes:

Cluster Ready Services (CRS): The primary program for managing high availability operations in a cluster.

The CRSD manages cluster resources based on the configuration information that is stored in OCR for each resource. This includes start, stop, monitor, and failover operations. The CRSD process generates events when the status of a resource changes. When you have Oracle RAC installed, the CRSD process monitors the Oracle database instance, listener, and so on, and automatically restarts these components when a failure occurs.

Cluster Synchronization Services (CSS): Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using certified third-party clusterware, then CSS processes interface with your clusterware to manage node membership information.

The cssdagent process monitors the cluster and provides I/O fencing. This service formerly was provided by Oracle Process Monitor Daemon (oprocd), also known as OraFenceService on Windows. A cssdagent failure may result in Oracle Clusterware restarting the node.

Oracle ASM: Provides disk management for Oracle Clusterware and Oracle Database.

Cluster Time Synchronization Service (CTSS): Provides time management in a cluster for Oracle Clusterware.

Event Management (EVM): A background process that publishes events that Oracle Clusterware creates.

Grid Naming Service (GNS): Handles requests sent by external DNS servers, performing name resolution for names defined by the cluster.

Oracle Agent (oraagent): Extends clusterware to support Oracle-specific requirements and complex resources. This process runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g release 1 (11.1).

Oracle Notification Service (ONS): A publish and subscribe service for communicating Fast Application Notification (FAN) events.

Oracle Root Agent(orarootagent): A specialized oraagent process that helps the CRSD manage resources owned by root, such as the network, and the Grid virtual IP address.
Oracle Cluster Registry

Oracle Clusterware uses the Oracle Cluster Registry (OCR) to store and manage information about the components that Oracle Clusterware controls, such as Oracle RAC databases, listeners, virtual IP addresses (VIPs), and services and any applications. OCR stores configuration information in a series of key-value pairs in a tree structure. To ensure cluster high availability, Oracle recommends that you define multiple OCR locations. In addition:

You can have up to five OCR locations

Each OCR location must reside on shared storage that is accessible by all of the nodes in the cluster

You can replace a failed OCR location online if it is not the only OCR location

You must update OCR through supported utilities such as Oracle Enterprise Manager, the Oracle Clusterware Control Utility (CRSCTL), the Server Control Utility (SRVCTL), the OCR configuration utility (OCRCONFIG), or the Oracle Database Configuration Assistant (Oracle DBCA)
Voting Files

Oracle Clusterware uses voting files to determine which nodes are members of a cluster. You can configure voting files on Oracle ASM, or you can configure voting files on shared storage.

If you configure voting files on Oracle ASM, then you do not need to manually configure the voting files. Depending on the redundancy of your disk group, an appropriate number of voting files are created.

If you do not configure voting files on Oracle ASM, then for high availability, Oracle recommends that you have a minimum of three voting files on physically separate storage. This avoids having a single point of failure. If you configure a single voting file, then you must use external mirroring to provide redundancy.

Oracle recommends that you do not use more than five voting files, even though Oracle supports a maximum number of 15 voting files.
As Per Oracle doc below are the high level steps for clusterware initialization.

INIT spawns init.ohasd (with respawn) which in turn starts the OHASD process (Oracle High Availability Services Daemon). This daemon spawns 4 processes.
Level 1: OHASD Spawns:

• cssdagent – Agent responsible for spawning CSSD.
• orarootagent – Agent responsible for managing all root owned ohasd resources.
• oraagent – Agent responsible for managing all oracle owned ohasd resources.
• cssdmonitor – Monitors CSSD and node health (along wth the cssdagent).
Level 2: OHASD rootagent spawns:

• CRSD – Primary daemon responsible for managing cluster resources.
• CTSSD – Cluster Time Synchronization Services Daemon
• Diskmon
• ACFS (ASM Cluster File System) Drivers
Level 3: OHASD oraagent spawns:

• MDNSD – Used for DNS lookup
• GIPCD – Used for inter-process and inter-node communication
• GPNPD – Grid Plug & Play Profile Daemon
• EVMD – Event Monitor Daemon
• ASM – Resource for monitoring ASM instances
Level 4: CRSD spawns:

• orarootagent – Agent responsible for managing all root owned crsd resources.
• oraagent – Agent responsible for managing all oracle owned crsd resources.

Level 4: CRSD rootagent spawns:

• Network resource – To monitor the public network
• SCAN VIP(s) – Single Client Access Name Virtual IPs
• Node VIPs – One per node
• ACFS Registery – For mounting ASM Cluster File System
• GNS VIP (optional) – VIP for GNS
Level 5: CRSD oraagent spawns:

• ASM Resource – ASM Instance(s) resource
• Diskgroup – Used for managing/monitoring ASM diskgroups.
• DB Resource – Used for monitoring and managing the DB and instances
• SCAN Listener – Listener for single client access name, listening on SCAN VIP
• Listener – Node listener listening on the Node VIP
• Services – Used for monitoring and managing services
• ONS – Oracle Notification Service
• eONS – Enhanced Oracle Notification Service
• GSD – For 9i backward compatibility
• GNS (optional) – Grid Naming Service – Performs name resolution
Clusterware Important Log File Locations


 ASM logs live under $ORACLE_BASE/diag/asm/+asm//trace