The topics in this section provide troubleshooting information to identify and address common issues that may occur while working with Stack Monitoring.
New permissions in resource-types are not propagated
This happens because IAM does not recompile a policy unless there is a change to the policy statement.
For any existing policies that use resource-types, when new permissions are added to the resource-type, edit the policy by adding a blank space. Then, save the policy.
This happens when a Tag Key Definition with a Value Type=List includes a tag variable as an element. Assigning such a tag to a resource works initially. However, validation fails during actions like refresh or when assigning a new tag, resulting in the error Invalid tags.
Correct Usage:
Tag variables can be used in default tags, but they are not supported in defined tags with predefined values (lists).
A Tag Key Definition cannot include tag variables as predefined list values.
Ensure that new policies can be created in tenancy or use existing policies (policy should exist in current compartment and in root compartment). In order to allow creation of new policies tenancy clean up outdated policies or work with Oracle to increase policy limits. Once new policies can be created, retry setup.
Compute auto activation cannot be created
Policy Manager expects that Stack Monitoring configurations are in ACTIVE status, as you can have one such configuration in compartment. If there are configurations in unexpected status, creation of configuration will fail.
Cleanup configurations in invalid state in current compartment. For cleanup, use public SDK or CLI, such as command delete.
Troubleshoot EBS 🔗
EBS Database with Edition-Based Redefinition (EBR)
For EBS instances with EBR enabled, after every new edition is created in the database, it's necessary to restart the management agent to restart the connection pool and continue metric collection. If not restarted, metrics stop collecting data.
Troubleshoot PeopleSoft 🔗
Discovery Job Behavior 🔗
This is an example of logs for two Process Scheduler Domain work items, one successful and saved, and the other presenting a domain down error. Each detailed log with its respective work item id.
Discovery Error Messages 🔗
Database validation failed error
The example below is the output from a failed discovery job. Using the Work Item (WI) ID, search through out the entire log message for additional details to determine the cause of the failed discovery.
A password/username input validation is necessary to ensure entering the correct credentials for our discovery job in Database Credentials section.
Message displays IO Error: The Network Adapter could not establish a connection due to UnknownHostException.
Then displays the host entered and the message Name or service not known. This indicates that host used in discovery job is either incorrect or had a typo error when capturing data for discovery job in PSFT Database section.
Validate it and then retry the discovery job.
Error due to a connection failure with PSFT Database displays Connection refused, socket connect lapse.
Then the host name and direction is displayed along with its port.
This is an error triggered when the Database Port is incorrect. Retry discovery with the right port.
Error message with a fetchlet exception and the log displays Listener refused the connection with the following error: ORA-12514, TNS: listener does not currently know of service requested in connect descriptor.
Retry discovery job and enter the right Database Service Name under PSFT Database section.
Resource families validation failed error
PeopleSoft has the following resource families:
Application Server Domain
Process Scheduler Domain
PeopleSoft Internet Architecture (PIA)
There can be several resources of each family in a discovery job. A discovery job will be marked as successful if at least one resource of each type is successful. Therefore, a job can be successful even if there are some work items failing for some child resources.
In case of error, it will show the following logs:
General error message
This describes that none of the resource families in the discovery job met the requitement of having at least one resource successful for each family. Then provides a list of resources families and shows the next summary logs (one for each family failing):
Summary of failing work items
This log example provides a list of failed work items for App Server family resources. Using the provided work items ids get the rest of logs with more details about the failures. Each work item can fail for different reasons and it is important to refer to each work item id in logs to see specifics. The following are possible errors for each work item and its solution.
Message Error
Troubleshooting
This type of error appears when discovery is provided with invalid credentials.
Example on left column shows a description for an Application Server Domain work item: "Discovery failed for oracle_psft_appserv", but this error is also applicable to process scheduler domain (oracle_psft_pcrs).
To fix this error enter the right credentials under that section.
This error indicates a domain is down for the resource that failed in discovery.
To fix it, verify that the application is running in PeopleSoft console, and turn the domain back on.
This type of error can occur for Process Scheduler Domain and AppServer Domain, with the failed to retrieve message, NameNotFoundException.
At the beginning of the log see which work item failed and also the reference to the work item id to easily identify the resource failing.
This error occurs when there is a misconfiguration for a PIA domain (down status).
Elasticsearch errors
If Elastic Search is discovered together with PeopleSoft discovery, this work item discovery will define the success or fail of the PeopleSoft discovery. If an error occurs while discovering Elastic Search and the work item fails, then the PeopleSoft discovery job will not be successful either.
The following is the message shown when an Elastic Search error appears. It provides a work item id to find detailed logs about what is provoking the failure.
General Error
Message Error
Troubleshooting
Failed to collect data, 500 SERVER ERROR.
There was an error trying to connect and collect data from the specified host.
This error log happens when an invalid username in the discovery was provided.
Failed to collect data, status 401.
Unauthorized access due to invalid credentials.
Ensure entering the right password while performing the discovery.
FileNotFoundException.
TrustStore path location provided is incorrect.
This could be due to a mistyped value entered in TrustStore path field or the file does not exists in the specified location.
Also, please ensure that the file is accessible on the agent host.
Password verification failed.
The TrustStore password provided is incorrect.
Troubleshoot SOA 🔗
Monitoring SOA applications created from Marketplace images:
When a SOA application is provisioned using Market place Image, then data in SOA related metrics are not populated. The Marketplace images places SOA and WebLogic configuration files in two seperate locations. To populate the SOA metrics, copy the configuration files from the configuration files to the WebLogic directory.
Please copy the files as indicated and restart Weblogic.
SOA Infra Metrics will start appearing in a few minutes after Weblogic restart
Marketplace image is installing SOA Suites in a different location than the Weblogic stack
/u01/app/oracle/middleware — Weblogic
/u01/app/oracle/suite/ --- SOA Suite
Please copy the following files:
From: /u01/app/oracle/suite/em/adml
-rwxrwxr-x. 1 oracle oracle 21156 May 18 2011 server-scheduler_service.xml
-rwxrwxr-x. 1 oracle oracle 15788 May 18 2011 domain-scheduler_service.xml
-rwxrwxr-x. 1 oracle oracle 2929 Nov 11 2013 server-bea_alsb.xml
-rwxrwxr-x. 1 oracle oracle 242238 Feb 28 2016 server-oracle_soainfra.xml
-rwxrwxr-x. 1 oracle oracle 2992 Aug 15 2016 server-oracle_soa_composite-11.0.xml
-rwxrwxr-x. 1 oracle oracle 95241 Jan 16 2017 domain-oracle_soainfra.xml
To: /u01/app/oracle/middleware/em/adml
Troubleshoot a Maintenance Window 🔗
Retry a Maintenance Window
A retry can be performed only after an operation is marked as Partial Success, for Active Maintenance Windows.
Access the actions menu of the Maintenance Window to access the Retry option.
Updated topology
When a resource changes its topology, like a cluster adding or removing one or several of its servers, the Maintenance Window is not automatically updated. To updated the resources included in the Maintenance Window after a topology change, it's necessary to edit the Maintenance Window according to the resource's new topology.