Troubleshoot Network Connectivity To determine if a VM Cluster is properly configured to access the Oracle Cloud Infrastructure (OCI) Services Network, you need to perform the following steps on each virtual machine in the VM Cluster.
Description: CPU offline scaling fails with the following
error:
** CPU Scale Update **An error occurred during module execution. Please refer to the log file for more information
Cause: After provisioning a VM cluster, the
/var/opt/oracle/cprops/cprops.ini file, which is automatically
generated by the database as a service (DBaaS) is not updated with the
common_dcs_agent_bindHost and
common_dcs_agent_port parameters and this causes CPU offline
scaling to fail.
Action: As the root user, manually add the following
entries in the /var/opt/oracle/cprops/cprops.ini
file.
Description: When adding a VM to a VM cluster, you might encounter
the following
issue:
[FATAL] [INS-32156] Installer has detected that there are non-readable files in oracle home.
CAUSE: Following files are non-readable, due to insufficient permission oracle.ahf/data/scaqak03dv0104/diag/tfa/tfactl/user_root/tfa_client.trc
ACTION: Ensure the above files are readable by grid.
Cause: Installer has detected a non-readable trace file,
oracle.ahf/data/scaqak03dv0104/diag/tfa/tfactl/user_root/tfa_client.trc
created by Autonomous Health Framework (AHF) in Oracle home that causes adding a cluster
VM to fail.
AHF ran as root created a trc file with
root ownership, which the grid user is not able to
read.
Action: Ensure that the AHF trace files are readable by the
grid user before you add VMs to a VM cluster. To fix the permission
issue, run the following commands as root on all the existing VM
cluster
VMs:
To determine if a VM Cluster is properly configured to access the Oracle Cloud Infrastructure (OCI) Services Network, you need to perform the following steps on each virtual machine in the VM Cluster.
Validation check for Identity and Access management connectivity:
ssh to a virtual machine on your ExaDB-D VM Cluster as opc user.
Execute the command: curl
https://identity.<region>.oci.oraclecloud.com here <region>
corresponds to the OCI region where your VM Cluster is deployed. If your VM Cluster
is deployed in the Ashburn region you need to use βus-ashburn-1β for <region>.
The curl command will now look like curl
https://identity.us-ashburn-1.oci.oraclecloud.com.
If your Virtual Cloud Network (VCN) is properly configured for accessing the OCI
Services Network, you will get an immediate response that looks
like
{
"code" : "NotAuthorizedOrNotFound",
"message" : "Authorization failed or requested resource not found."
}
The ssh session will hang and will eventually timeout if your network is not
configured for accessing the OCI Services
Depending on your VCN setup, you will need to follow the steps outlined in the
action section below to configure access to the OCI Services Network.
Validation check for Object Storage Service (OSS) connectivity:
ssh to a virtual machine on your ExaDB-D VM
Cluster as opc user.
Execute the command: curl
https://objectstorage.<region>.oraclecloud.com, here <region>
corresponds to the OCI region where your VM Cluster is deployed. If your VM Cluster
is deployed in the Ashburn region you need to use βus-ashburn-1β for <region>.
The curl command will now look like curl
https://objectstorage.us-ashburn-1.oraclecloud.com .
If your Virtual Cloud Network (VCN) is properly configured for accessing the OCI
Services Network, you will get an immediate response that looks like
{
"code" : "NotAuthorizedOrNotFound",
"message" : "Authorization failed or requested resource not found."
}
The ssh session will hang and will eventually timeout if your network is not
configured for accessing the OCI Services
Depending on your VCN setup, you will need to follow the steps outlined in the
action section below to configure access to the OCI Services Network.
Action:
This action is applicable to customers who have deployed their VM Cluster on a
private subnet.
Once you configure your VCN to reach the OCI Services network following the above
instructions, execute the steps in both the Validation check sections to ensure
that you have established connectivity to the OCI Services network from your VM
Cluster.
Backup Failures in Exadata
Database Service on Dedicated Infrastructure π
If your Exadata managed backup does not successfully complete, you can use
the procedures in this topic to troubleshoot and fix the issue.
The most common causes of backup failure are the following:
The host cannot access Object Storage
The database configuration on the host is not correct
The information that follows is organized by the error condition. If you already know the
cause, you can skip to the section with the suggested solution. Otherwise, use the
procedure in Determining the Problem to get started.
Determining the Problem In the Console, a failed database backup either displays a status of Failed or hangs in the Backup in Progress or Creating state.
Database Service Agent Issues Your Oracle Cloud Infrastructure database makes use of an agent framework to allow you to manage your database through the cloud platform. Use the following to check and restart the dbcsagent.
Object Store Connectivity Issues Backing up your database to Oracle Cloud Infrastructure Object Storage requires that the host can connect to the applicable Swift endpoint.
Host Issues One or more of the following conditions on the database host can cause backups to fail:
Database Issues An improper database state or configuration can lead to failed backups.
In the Console, a failed database backup either displays a status of Failed or hangs in the Backup in Progress or Creating state.
If the error message does not contain enough information to point you to a solution, you can gather more information by using dbaascli and by viewing the log files. Then, refer to the applicable section in this topic for a solution.
Database backups can fail during the RMAN configuration stage or during a running RMAN backup job. RMAN configuration tasks include validating backup destination connectivity, backup module installation, and RMAN configuration changes. The log files you examine depend on which stage the failure occurs.
Log on to the host as the oracle user.
Check the applicable log file:
To identify the job ID of an automated backup, use the dbaascli database backup --dbname <dbname> --showHistory command. This displays the history of all backup jobs, including their corresponding job IDs.
Job logs are available at /var/opt/oracle/log/dtrs/jobs/, named using the format <job_id>.log. If a job fails, a corresponding debug log <job_id>.debug is also generated in the same location.
You can find the corresponding RMAN command execution logs for backup, recovery, and configuration operations in the /var/opt/oracle/log/<dbname>/dtrs/rman/bkup directory.
Note
Make sure to review the log files on all compute nodes of the Exadata DB system.
Your Oracle Cloud Infrastructure database makes use of an agent framework to allow you to manage your database through the cloud platform. Use the following to check and restart the dbcsagent.
Occasionally you might need to restart the dbcsagent program if it has the status of stop/waiting to resolve a backup failure. View the /opt/oracle/dcs/log/dcs-agent.log file to identify issues with the agent.
From a command prompt, check the status of the
agent:
systemctl status dbcsagent.service
If the agent is in the stop/waiting state, try to restart
the agent:
systemctl start dbcsagent.service
Check the status of the agent again to confirm that it has the
stop/running status:
Backing up your database to Oracle Cloud Infrastructure Object Storage
requires that the host can connect to the applicable Swift endpoint.
Though Oracle controls the actual Swift user credentials for the storage bucket for
managed backups, verifying general connectivity to Object Storage in your region is a
good indicator that object store connectivity is not the issue. You can test this
connectivity by using another Swift user.
One or more of the following conditions on the database host can cause
backups to fail:
If an interactive command such as oraenv, or any command that might
return an error or warning message, was added to the .bash_profile
file for the grid or oracle user, Database service operations like automatic backups
can be interrupted and fail to complete. Check the .bash_profile
file for these commands, and remove them.
Backup operations require space in the /u01 directory on the host
file system. Use the df -h command on the host to check the space
available for backups. If the file system has insufficient space, you can remove old
log or trace files to free up space.
Your system might not have the required version of the backup module
(opc_installer.jar). See Unable to use Managed Backups in your DB System for
details about this known issue. To fix the problem, you can follow the procedure in
that section or simply update your DB system and database with the latest bundle
patch.
Customizing the site profile file (
$ORACLE_HOME/sqlplus/admin/glogin.sql ) can cause managed
backups to fail in Oracle Cloud Infrastructure. In particular, interactive commands
can lead to backup failures. Oracle recommends that you not modify this file for
databases hosted in Oracle Cloud Infrastructure.
An improper database state or configuration can lead to failed
backups.
The database must be active and running (ideally on all nodes) while the backup is in
progress.
Use the following command to check the state of your database, and
ensure that any problems that might have put the database in an improper state are
resolved:
srvctl status database -d <db_unique_name> -verbose
The system returns a message including the database's instance status. The
instance status must be Open for the backup to succeed. If the
database is not running, use the following command to start it:
srvctl start database -d <db_unique_name> -o open
If the database is mounted but does not have the Open status, use
the following commands to access the SQL*Plus command prompt and set the status to
Open:
sqlplus / as sysdba
alter database open;
When you provision a new database, the archiving mode is set to
ARCHIVELOG by default. This is the required archiving mode for
backup operations. Check the archiving mode setting for the database and change it
to ARCHIVELOG, if applicable.
Open an SQL*Plus command prompt and enter the following command:
select log_mode from v$database;
If you need to set the archiving mode to ARCHIVELOG, start the database
in MOUNT status (and not OPEN status), and use the
following command at the SQL*Plus command prompt:
alter database archivelog;
Confirm that the db_recovery_file_dest parameter points to
+RECO, and that the log_archive_dest_1 parameter
is set to USE_DB_RECOVERY_FILE_DEST.
For RAC databases, one instance must have the MOUNT status when enabling
archivelog mode. To enable archivelog mode for a RAC database, perform the following
steps:
Shut down all database instances:
srvctl stop database -d
Start one of the database instances in mount state:
srvctl start instance -d <db_unique_name> -i <instance_name> -o mount
At the SQL*Plus command prompt, confirm the archiving mode is set to:
ARCHIVELOG:
select log_mode from v$database;
Backups can fail when the database instance has a stuck archiver
process. For example, this can happen when the flash recovery area (FRA) is full.
You can check for this condition using the srvctl status database -db
<db_unique_name> -v command. If the
command returns the following output, you must resolve the stuck archiver process
issue before backups can succeed:
Instance <instance_identifier> is running on node *<node_identifier>. Instance status: Stuck Archiver
Refer to ORA-00257:Archiver Error (Doc ID 2014425.1) for information on resolving a
stuck archiver process.
After resolving the stuck process, the command should return the
following output:
Instance <instance_identifier> is running on node *<node_identifier>. Instance status: Open
If the instance status does not change after you resolve the underlying issue with
the device or resource being full or unavailable, try restarting the database using
the srvctl command to update the status of the database in the
clusterware.
Editing certain RMAN configuration parameters can lead to backup failures in Oracle
Cloud Infrastructure. To check your RMAN configuration, use the show
all command at the RMAN command line prompt.
See the following list of parameters for details about RMAN the configuration
settings that should not be altered for databases in Oracle Cloud
Infrastructure.
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 30 DAYS;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE DEVICE TYPE 'SBT_TAPE' PARALLELISM 5 BACKUP TYPE TO COMPRESSED BACKUPSET;
CONFIGURE CHANNEL DEVICE TYPE DISK MAXPIECESIZE 2 G;
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS 'SBT_LIBRARY=/var/opt/oracle/dbaas_acfs/<db_name>/opc/libopc.so, ENV=(OPC_PFILE=/var/opt/oracle/dbaas_acfs/<db_name>/opc/opc<db_name>.ora)';
CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO 'SBT_TAPE';
CONFIGURE CHANNEL DEVICE TYPE DISK MAXPIECESIZE 2 G;
CONFIGURE ENCRYPTION FOR DATABASE ON;
RMAN backups fail when an object store wallet file is lost. The wallet file is
necessary to enable connectivity to the object store.
Get the name of the database with the backup failure using SQL*Plus:
show parameter db_name
Determine the file path of the backup config parameter file that contains the
RMAN wallet information at the Linux command line:
Find the file path to the wallet file in the backup config parameter
file by inspecting the value stored in the OPC_WALLET
parameter. To do this, navigate to the directory containing the backup config
parameter file and use the following cat command:
Confirm that the cwallet.sso file exists in the directory
specified in the OPC_WALLET parameter, and confirm that the
file has the correct permissions. The file permissions should have the octal
value of "600" (-rw-------). Use the following command:
ls -ltr /var/opt/oracle/dbaas_acfs/<database_name>/opc/opc_wallet
For example:
ls -altr /var/opt/oracle/dbaas_acfs/testdb30/opc/opc_wallet
-rw------- 1 oracle oinstall 0 Oct 29 01:59 cwallet.sso.lck
-rw------- 1 oracle oinstall 111231 Oct 29 01:59 cwallet.sso
Learn to identify the root cause of TDE wallet and backup
failures.
For backup operations to work, the
$ORACLE_HOME/network/admin/sqlnet.ora file must contain the
ENCRYPTION_WALLET_LOCATION parameter formatted exactly as
follows:
Database backups fail if the TDE wallet is not in the proper state. The following
scenarios can cause this problem:
If the database was started using SQL*Plus, and the ORACLE_UNQNAME
environment variable was not set, the wallet is not opened correctly.
To fix the problem, start the database using the srvctl
utility:
srvctl start database -d <db_unique_name>
In a multitenant environment for Oracle Database versions that support PDB-level
keystore, each PDB has its own master encryption key. For Oracle 18c databases, this
encryption key is stored in a single keystore used by all containers. (Oracle
Database 19c does not support a keystore at the PDB level.) After you create or plug
in a new PDB, you must create and activate a master encryption key for it. If you do
not do so, the STATUS column in the
v$encryption_wallet view shows the value
OPEN_NO_MASTER_KEY.
To check the master encryption key status and create a master key, do the
following:
Review the the STATUS column in the
v$encryption_wallet view, as shown in the following
example:
SQL> alter session set container=pdb2;
Session altered.
SQL> select WRL_TYPE,WRL_PARAMETER,STATUS,WALLET_TYPE from v$encryption_wallet;
WRL_TYPE WRL_PARAMETER STATUS WALLET_TYPE
---------- ----------------------------------------------- ------------------ -----------
FILE /var/opt/oracle/dbaas_acfs/testdb30/tde_wallet/ OPEN_NO_MASTER_KEY AUTOLOGIN
Confirm that the PDB is in READ WRITE open mode and is not restricted, as
shown in the following example:
SQL> show pdbs
CON_ID CON_NAME OPEN MODE RESTRICTED
------ ------------ ---------------------- ---------------
2 PDB$SEED READ ONLY NO
3 PDB1 READ WRITE NO
4 PDB2 READ WRITE NO
The PDB cannot be open in restricted mode (the
RESTRICTED column must show NO). If
the PDB is currently in restricted mode, review the information in the
PDB_PLUG_IN_VIOLATIONS view and resolve the issue
before continuing. For more information on the
PDB_PLUG_IN_VIOLATIONS view and the restricted status,
review the Oracle Multitenant
Administratorβs Guide on pluggable database for your Oracle
Database version.
Create and activate a master encryption key for the PDB:
Set the container to the PDB:
ALTER SESSION SET CONTAINER = <pdb>;
Create and activate a master encryption key in the PDB by
executing the following command:
ADMINISTER KEY MANAGEMENT SET KEY USING TAG '<tag>'
FORCE KEYSTORE IDENTIFIED BY <keystore-password> WITH BACKUP USING '<backup_identifier>';
Note the following:
The USING TAG clause is optional and can
be used to associate a tag with the new master encryption key.
The WITH BACKUP clause is optional and
can be used to create a backup of the keystore before the new master
encryption key is created.
You can also use the dbaascli commands
dbaascli tde status and dbaascli tde rotate
masterkey to investigate and manage your keys.
Confirm that the status of the wallet has changed from
OPEN_NO_MASTER_KEY to OPEN by querying the
v$encryption_wallet view as shown in step 1.
Configuration parameters related to the TDE wallet can cause backups to fail.
Confirm that the wallet status is open and the wallet type is
auto login by checking the v$encryption_wallet
view. For example:
SQL> select status, wrl_parameter,wallet_type from v$encryption_wallet;
STATUS WRL_PARAMETER WALLET_TYPE
------- ---------------------------------------------- --------------
OPEN /var/opt/oracle/dbaas_acfs/testdb30/tde_wallet/ AUTOLOGIN
For pluggable databases (PDBs), ensure that you switch to the appropriate container
before querying v$encryption_wallet view. For example:
$ sqlplus / as sysdba
SQL> alter session set container=pdb1;
Session altered.
SQL> select WRL_TYPE,WRL_PARAMETER,STATUS,WALLET_TYPE from v$encryption_wallet;
WRL_TYPE WRL_PARAMETER STATUS WALLET_TYPE
--------- ----------------------------------------------- -------- -----------
FILE /var/opt/oracle/dbaas_acfs/testdb30/tde_wallet/ OPEN AUTOLOGIN
The TDE wallet file (ewallet.p12) can cause backups to fail if it is
missing, or if it has incompatible file system permissions or ownership. Check the
file as shown in the following example as the root user:
# ls -altr /var/opt/oracle/dbaas_acfs/<database_name>/tde_wallet/ewallet.p12
total 76
-rw------ 1 oracle oinstall 5467 Oct 1 20:17 ewallet.p12
The TDE wallet file should have file permissions with the octal value "600"
(-rw-------), and the owner of this file should be a part of
the oinstall operating system group.
The auto login wallet file (cwallet.sso) can cause
backups to fail if it is missing, or if it has incompatible file system permissions
or ownership. Check the file as shown in the following example as the
root user:
# ls -altr /var/opt/oracle/dbaas_acfs/<database_name>/tde_wallet/cwallet.sso
total 76
-rw------ 1 oracle oinstall 5512 Oct 1 20:18 cwallet.sso
The auto login wallet file should have file permissions with the octal
value "600" (-rw-------), and the owner of this file should be a
part of the oinstall operating system group.
Learn to identify and resolve Oracle Data Guard issues.
When troubleshooting Oracle Data Guard, you must first determine whether the problem
occurs during the Data Guard setup and initialization or during Data Guard operation,
when lifecycle commands are entered. The steps to identify and resolve the issues are
different, depending on the scenario in which they are used.
There are three lifecycle operations: switchover, failover, and reinstate. The Data Guard
broker is used for all of these commands. The broker command line interface
(dgmgrl) is the main tool used to identify and troubleshoot the
issues. Although you can use logfiles to identify root causes, dgmgrl
is faster and easier to use to check and identify an issue.
Setting up and enabling Data Guard involves multiple steps. Log files are
created for each step. If any of the steps fail, review the relevant log file to
identify and fix the problem.
Validation of the primary cloud VM Cluster and database
Validation of the standby cloud VM Cluster
Recreating and copying files to the standby database (passwordfile and wallets)
Creating Data Guard through Network (RMAN Duplicate command)
Configuring Data Guard broker
Finalizing the setup
Troubleshooting Data Guard using logfiles The tools used to identify the issue and the locations of relevant logfiles are different, depending on the scenario in which they are used.
Troubleshooting the Data Guard Setup Process The following errors might occur in the different steps of the Data Guard setup process. While some errors are displayed within the Console, most of the root causes can be found in the logfiles
The tools used to identify the issue and the locations of relevant logfiles
are different, depending on the scenario in which they are used.
Use the following procedures to collect relevant log files to investigate issues. If you
are unable to resolve the problem after investigating the log files, contact My Oracle
Support.
Note
When preparing collected files for
Oracle Support, bundle them into a compressed archive, such as a ZIP file.
On each compute node associated with the Data Guard configuration, gather log files
pertaining to the problem you experienced.
Enablement stage log files (such as those documenting the Create Standby
Database operation) and the logs for the corresponding primary or standby
system.
Enablement job ID logfiles. For example: 23.
Locations of enablement log files by enablement stage and Exadata system
(primary or standby).
Database name logfiles (db_name or
db_unique_name, depending on the file path).
Note
Check all nodes of the
corresponding primary and standby Exadata systems. Commands executed on a system may
have been run on any of its nodes.
Data Guard Deployer (DGdeployer) is the process that
performs the configuration. When configuring the primary database, it creates the
/var/opt/oracle/log/<dbname>/dgdeployer/dgdeployer.log
file.
This log should contain the root cause of a failure to configure the primary
database.
The primary log from the dbaasapi command-line
utility is:
/var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log.
Look for entries that contain dg_api.
One standby log from the dbaasapi command-line
utility is:
/var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log.
In this log, look for entries that contain dg_api.
The other standby log is:
/var/opt/oracle/log/<dbname>/dgcc/dgcc.log.
This log is the Data Guard configuration log.
The Oracle Cloud Deployment Engine (ODCE) creates the
/var/opt/oracle/log/<dbname>/ocde/ocde.log
file. This log should contain the cause of a failure to create the standby
database.
The dbaasapi command line utility creates the
var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log
file. Look for entries that contain dg_api.
The Data Guard configuration log file is
/var/opt/oracle/log/<dbname>/dgcc/dgcc.log.
DGdeployer is the process that performs the
configuration. It creates the following
/var/opt/oracle/log/<dbname>/dgdeployer/dgdeployer.log
file. This log should contain the root cause of a failure to configure the
standby database.
The dbaasapi command-line utility creates the
/var/opt/oracle/log/dbaasapi/db/dg/<job_ID>.log
file. Look for entries that contain dg_api.
The Data Guard configuration log is
/var/opt/oracle/log/<dbname>/dgcc/dgcc.log.
DGdeployer is the process that performs the
configuration. While configuring Data Guard, it creates the
/var/opt/oracle/log/<dbname>/dgdeployer/dgdeployer.log
file. This log should contain the root cause of a failure to configure the primary
database.
On each node of the primary and standby sites, gather log files for the
related database name (db_name).
Note
Check all nodes on both primary and standby Exadata systems. A lifecycle
management operation may impact both primary and standby systems.
The following errors might occur in the different steps of the Data Guard
setup process. While some errors are displayed within the Console, most of the root causes
can be found in the logfiles
The password entered for enabling Data Guard didn't match the primary admin password
for the SYS user. This error occurs during the Validate Primary stage of
enablement.
The database may not be running. This error occurs during the Validate Primary stage
of enablement. Check with srvctl and sql on the
host to verify that the database is up and running on all nodes.
The primary database could not be configured. Invalid Data Guard commands or failed
listener reconfiguration can cause this error.
The TDE wallet could not be created. The Oracle Transparent Database Encryption (TDE)
keystore (wallet) files could not be prepared for transportation to the standby
site. This error occurs during the create TDE Wallet stage of enablement. Either of
the following items can cause failure at this stage:
The TDE wallet files could not be accessed
The enablement commands could not create an archive containing the wallet
files
Troubleshooting procedure:
Ensure that the cluster is accessible. To check the status of a cluster, run the
following command:
crsctl check cluster -all
If the cluster is down, run the following command to restart it:
crsctl start crs -wait
If this error occurs when the cluster is accessible, check the logs for create
TDE Wallet (enablement stage) to determine cause and resolution for the
error.
The archive containing the TDE wallet was likely not transmitted to the standby site.
Retrying usually solves the problem.
The primary and standby sites may not be able to communicate with each other to
configure the standby database. These errors occur during the configure standby
database stage of enablement. In this stage, configurations are performed on the
standby database, including the rman duplicate of the primary database. To
resolve this issue:
Verify the connectivity status for the primary and standby sites.
Ensure that the host can communicate from port 1521 to all ports. Check
the network setup, including Network Security Groups (NSGs), Network
Security Lists, and the remote VCN peering setup (if applicable). The
best way to test communication between the host and other nodes is to
access the databases using SQL*PLUS from the primary to standby and from
the standby to the primary.
The SCAN VIPs or listeners may not be running. Use the test above to help
identify the issue.
Possible causes:
SCAN VIPs or listeners may not be running. You can confirm this issue by using
the following commands on any cluster node.
[grid@exa1-****** ~]$ srvctl status
scan
[grid@exa1-****** ~]$ srvctl status
scan_listener
Databases may not be reachable. You can confirm this issue by attempting to
connect using an existing Oracle Net alias.
Troubleshooting procedure:
As the oracle OS user, check for the existence of an Oracle Net alias for the
container database (CDB). Look for an alias in
$ORACLE_HOME/network/admin/<dbname>/tnsnames.ora.
The following example
shows an entry for a container database named
db12c:
Verify that you can use the alias to connect to the database. For example, as
sysdba, enter the following command:
sqlplus sys@db12c
A possible cause for this error is that the Oracle Database sys or system user
passwords for the database and the TDE wallet may not be the same. To compare the
passwords:
Connect to the database as the sys user and check the TDE status in
V$ENCRYPTION_WALLET
.
Connect to the database as the system user and check the TDE status in
V$ENCRYPTION_WALLET
.
Update the applicable passwords to match. Log on to the system host as opc
and run the following commands:
When the switchover, failover, and reinstate commands are run, multiple error
messages may occur. Refer to the Oracle Database documentation for these error
messages.
Note
Oracle recommends using the Data Guard broker command line interface (dgmgrl) to
validate the configurations.
As the Oracle User, connect to the primary or standby database with
dgmgrl and verify the configuration and the
database:
Patching Failures on Exadata Cloud Infrastructure Systems
π
Patching operations can fail for various reasons. Typically, an operation fails
because a database node is down, there is insufficient space on the file
system, or the virtual machine cannot access the object store.
Determining the Problem In the Console, you can identify a failed patching operation by viewing the patch history of an Exadata Cloud Infrastructure system or an individual database.
Troubleshooting and Diagnosis Diagnose the most common issues that can occur during the patching process of any of the Exadata Cloud Infrastructure components.
In the Console, you can identify a failed patching operation by viewing the
patch history of an Exadata Cloud Infrastructure system or an individual database.
A patch that was not successfully applied displays a status of
Failed and includes a brief description of the
error that caused the failure. If the error message does not contain enough
information to point you to a solution, you can use the database CLI and log
files to gather more data. Then, refer to the applicable section in this
topic for a solution.
One or more of the following conditions on the database server VM can cause
patching operations to fail.
Database Server VM Connectivity
Problems
Cloud tooling relies on the proper networking and connectivity
configuration between virtual machines of a given VM cluster. If the configuration
is not set properly, this may incur in failures on all the operations that require
cross-node processing. One example can be not being able to download the required
files to apply a given patch.
Given the case, you can perform the following actions:
Verify that your DNS configuration is correct so that the relevant
virtual machine addresses are resolvable within the VM cluster.
Refer to the relevant Cloud Tooling logs as instructed in the Obtaining
Further Assistance section and contact Oracle Support for further
assistance.
One or more of the following conditions on Oracle Grid Infrastructure can
cause patching operations to fail.
Oracle Grid Infrastructure is
Down
Oracle Clusterware enables servers to communicate with each other so that they can
function as a collective unit. The cluster software program must be up and running
on the VM Cluster for patching operations to complete. Occasionally you might need
to restart the Oracle Clusterware to resolve a patching failure.
In such cases, verify the status of the Oracle Grid Infrastructure as
follows:
./crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
If Oracle Grid Infrastructure is down, then restart by running the following
commands:
An improper database state can lead to patching failures.
Oracle Database is Down
The database must be active and running on all the active nodes so the patching
operations can be completed successfully across the cluster.
Use the following command to check the state of your database, and ensure that any
problems that might have put the database in an improper state are
resolved:
srvctl status database -d db_unique_name -verbose
The system returns a message including the database instance status. The instance
status must be Open for the patching operation to succeed.
If the database is not running, use the following command to start
it:
If you were unable to resolve the problem using the information in this topic, follow the
procedures below to collect relevant database and diagnostic information. After you have
collected this information, contact Oracle Support.
Collecting Cloud Tooling Logs Use the relevant log files that could assist Oracle Support for further investigation and resolution of a given issue.
To collect the relevant Oracle diagnostic information and logs, run the dbaascli
diag collect command.
For more information about the usage of this utility, see DBAAS Tooling: Using dbaascli to Collect Cloud Tooling Logs and Perform a Cloud
Tooling Health Check.