NMI interrupt trigger issue

Preface

The customer is using the STM32G491MCT6 part number, but during the trial production phase, some components are getting stuck while reading Flash. This issue hasn’t been encountered in regular testing, so we had to rely on experience to troubleshoot. Using PC counter, we eventually discovered that the reported positions were all in the NMI interrupt. There is limited information available on this problem, so documenting it here for reference.

NMI Interrupt Introduction

In computing, a Non-Maskable Interrupt (NMI) is a hardware interrupt that cannot be ignored by standard interrupt masking techniques of the system. It is usually done to signal unrecoverable hardware errors. Some NMIs might be maskable, in which case you should use your specific methods tailored for that particular NMI. NMIs are typically used when response time is critical or interrupts should not be disabled during normal system operation. Such uses include reporting unrecoverable hardware errors, system debugging and analysis, and handling special situations like system resets.

The chart indicates that NMI interrupts can be triggered by SRAM parity error, Flash ECC, and HSE CSS.

Case1:Clock security (CSS)

configuring HSE and CSS in RCC, and shorting one of the HSE pins to something else, but it might void your warranty.

SCB->ICSR |= SCB_ICSR_NMIPENDSET;

The basic solution here is to investigate hardware issues such as HSE.

Case2:SRAM parity error

The parity bits are computed and stored when writing into the SRAM. Then, they are
automatically checked when reading. If one bit fails, an NMI is generated.

The solution here is to reset the MC or check for hardware issues. (However, specific solutions for this aspect haven’t been encountered yet; further information will be provided once available.)

Case3:Flash ECC

When performing erase and rewrite operations on a specific FLASH region within user code, this error is probabilistically triggered. Once the ECCD error is triggered, even if the ECCD flag is cleared within the NMI interrupt, it will be immediately triggered again upon exiting the interrupt, leading to a deadlock. If the ECCD error is not triggered, there is a high probability of triggering the ECCC error. In another scenario, the ECCC error may not be triggered (flag is 0), but the ADDR_ECC value changes, pointing to the problematic area.

  • It will not be triggered if only erased without rewriting.
  • Operating on other regions does not trigger it; the problematic area is within approximately 10-20 bytes.
  • The issue is specific to the PCBA; other PCBAs do not trigger it.

solution

  • Minimize reliance on flash data. When an ECCC/ECCD error occurs while reading flash and enters the NMI interrupt, clear the interrupt flag and execute the Flash erase command.
  • Based on point 1, restore flash data using default parameters. When an NMI interrupt occurs while reading FLASH, clear the interrupt flag within the interrupt. Set a flag within the NMI interrupt, and upon returning to the Main program, check this flag. If it is set, it indicates an ECC error, and flash data needs to be restored using default parameters (default parameters can be organized into C language source code using fixed addresses).
  • If an NMI interrupt occurs while writing to Flash, re-erase Flash within the NMI interrupt until the data is written correctly, or write other implementation logic based on product requirements.

Leave a Comment

Your email address will not be published. Required fields are marked *

Shopping Cart