Sending a Diagnostic Interrupt

Caution

This feature is for advanced users. Sending a diagnostic interrupt to a live system can cause data corruption or system failure.

You can send a diagnostic interrupt to troubleshoot an unresponsive or unreachable compute virtual machine (VM) instance.

A diagnostic interrupt causes the instance's OS to crash and reboot. Before you send a diagnostic interrupt, you must configure the OS to generate a crash dump (also called a memory dump file) when it crashes. The crash dump captures information about the state of the OS at the time of the crash. After the OS restarts, you can analyze the crash dump to identify and debug the issue.

Required IAM Policy

To use Oracle Cloud Infrastructure, you must be granted security access in a policy  by an administrator. This access is required whether you're using the Console or the REST API with an SDK, CLI, or other tool. If you get a message that you don't have permission or are unauthorized, verify with your administrator what type of access you have and which compartment  to work in.

For administrators: The policy in Let users launch compute instances includes the ability to send a diagnostic interrupt to an instance. If the specified group doesn't need to launch instances or attach volumes, you could simplify that policy to include only manage instance-family, and remove the statements involving volume-family and virtual-network-family.

If you're new to policies, see Managing Identity Domains and Common Policies. For reference material about writing policies for instances, cloud networks, or other Core Services API resources, see Details for Core Services.

Configuring the OS to Generate a Crash Dump

Before you send a diagnostic interrupt to an instance, you must configure the OS to generate a crash dump when it crashes. The diagnostic interrupt is received as a non-maskable interrupt (NMI) on the target instance.

The steps depend on the OS.

Linux

Note

On Oracle Linux platform images, the OS is either fully configured or partially configured to generate a crash dump, depending on the image release date.

Oracle Linux 8
  • Images released in August 2020 or later: The image is fully configured to generate a crash dump.
  • Earlier images: The dump-capture kernel is installed and configured, but you must perform the other configuration steps.
Oracle Linux 7
  • Images released in August 2020 or later: The image is fully configured to generate a crash dump.
  • Earlier images: The dump-capture kernel is installed and configured, but you must perform the other configuration steps.
  1. Connect to the instance.
  2. Install and configure the dump-capture kernel:
    1. Install kdump and kexec by running the following command:
      sudo yum install kexec-tools
    2. Reserve memory on the kernel to save the crash dump. Do the following:
      1. Open the etc/default/grub file in a text editor.
      2. In the line that starts with GRUB_CMDLINE_LINUX_DEFAULT, add the parameter crashkernel=<memory-to-reserve>. For example, to reserve 100 MB, add crashkernel=100M.
      3. Save the changes and close the file.
      4. Rebuild the GRUB file by running the following command:
        sudo grub2-mkconfig -o /boot/grub2/grub.cfg
  3. Configure the kernel to crash when it receives a diagnostic interrupt. To do this, open the /etc/sysctl.conf file in a text editor and add the following line:
    kernel.unknown_nmi_panic=1
  4. Apply the change to /etc/sysctl.conf by running the following command:
    sysctl -p

Windows Server - Platform Image

If you use a Windows Server platform image that was released in April 2020 or later, the image is already configured to generate a crash dump.

If you use an image that was released before April 2020, do the following:

  1. Connect to the instance.
  2. Download the Oracle VirtIO Drivers for Microsoft Windows.
  3. Install the drivers and then restart the instance.

Windows Server - Customer-Provided Image

Refer to the third-party documentation for your operating system for more information.

Sending a Diagnostic Interrupt

After you configure the instance's OS to generate a crash dump when it crashes, use the following procedures to send a diagnostic interrupt.

To send a diagnostic interrupt using the Console

  1. Open the navigation menu  and select Compute. Under Compute, select Instances.
  2. Click the instance that you're interested in.
  3. Click More Actions, and then click Send diagnostic interrupt.

    Caution

    Sending a diagnostic interrupt to a live system can cause data corruption or system failure.
  4. Review the confirmation message and then click Send diagnostic interrupt.

    The lifecycle state that appears in the Console remains Running while the instance's OS crashes and restarts. Do not send multiple diagnostic interrupts.

  5. Wait several minutes for the instance's OS to restart, and then connect to the instance. You can now retrieve and analyze the crash dump.

To send a diagnostic interrupt using the API

Use the InstanceAction operation, passing the value SENDDIAGNOSTICINTERRUPT as the action to perform.

Analyzing a Crash Dump

The crash dump is saved locally on the instance's OS.

  • Linux instances: The default location where the crash dump is saved depends on the operating system.

    • Oracle Linux 8: Saved in /var/oled/crash.
    • Oracle Linux 7: For platform images released in March 2021 or later, saved in /var/crash. For older platform images, saved in /var/oled/crash.
    • Other Linux and UNIX-like operating systems: Saved in /var/crash/.

    To change the location, modify the /etc/kdump.conf file.

  • Windows instances: The crash dump is saved in %SystemRoot%memory.dmp. On most Windows systems, this is C:\Windows\memory.dmp.

To analyze the crash dump, use a third-party tool such as the crash utility on Linux instances or WinDbg on Windows instances.