[ESP32] How to deal with stack overflow error after MQTT communication that occurred after migration to ESP-IDF

esp32-how-to-deal-with-stack-overflow-error-after-mqtt-communication-that-occurred-after-migration-to-esp-idf
This article can be read in about 25 minutes.

Introduction.

Currently developing “Mia,” a talking cat-shaped robot that speaks dialect.

ミーア
おしゃべり猫型ロボット「ミーア」は、100以上の種類の豊かな表情で、悲しみや喜びを共有します。様々な性格(皮肉・おせっかい・ロマンチスト・天然)や方言(大阪弁・博多弁・鹿児島弁・広島弁)も話してくれます。ミーアとの暮らしで、毎日の生活をもっ...

In a previous article here, I wrote that the PlatformIO IDE on VS Code now supports the ESP-IDF framework as well as Arduino.

I was relieved to finally be able to build, but after establishing MQTT communication after connecting to Wifi via SmartConfig, I got the following error and had to restart endlessly.

ShellScript<span role="button" tabindex="0" data-code="SyncShadow is already started MQTTPubSubClient::onMessage: $aws/things/XXXXXXX ***ERROR*** A stack overflow in task loopTask has been detected. Backtrace: 0x400829fa:0x3ffbdec0 0x401684e5:0x3ffbdee0 0x4008a3d2:0x3ffbdf00 0x4008c622:0x3ffbdf80 0x4008a548:0x3ffbdfa0 0x4008a4fa:0x3ffbe020 0x4008a548:0x0000523a |
SyncShadow is already started 
 MQTTPubSubClient::onMessage: $aws/things/XXXXXXX 
 ***ERROR*** A stack overflow in task loopTask has been detected. 
 Backtrace:  0x400829fa :0x3ffbdec0  0x401684e5 :0x3ffbdee0  0x4008a3d2 :0x3ffbdf00  0x4008c622 :0x3ffbdf80  0x4008a548 :0x3ffbdfa0  0x4008a4 fa :0x3 ffbe020  0x4008a548 :0x0000523a |<-CORRUPTED 
 
 ELF file SHA256:  a98d6ccbac74a ac1
 Rebooting...

The above error response is described in this issue.

What is loopTask?

loopTask refers to the main loop task of the ESP32 program. This corresponds to the loop() function in the Arduino framework, an infinite loop task that runs on the ESP32. So it is most likely caused by the recursive process in void loop() described in main.cpp.

Causes of stack overflow

Stack overflow is a phenomenon in which a task runs out of allocated stack memory. This has the following causes

  1. Excessive recursive calls: functions are called recursively, exhausting the stack.
  2. Large local variables: Too many or too large temporary local variables.
  3. Infinite loop: Stack-consuming processing is taking place in an infinite loop.

For stacks, see below.

Stack overflow may occur due to the receiving and parsing process of MQTT messages. Be especially careful when processing large messages.

Why does this occur with ESP-IDF when it did not occur with Arduino?

Differences between Arduino and ESP-IDF frameworks can cause stack overflow due to differences in stack size and task memory management.

Default stack size: 1

  • Arduino: The task stack size is relatively large, so stack overflow is unlikely to occur.
  • ESP-IDF: By default, task stack size may be small, causing stack overflow in the same code as the Arduino framework

Task Management:.

  • Arduino Framework: High level of abstraction and simple task management.
  • ESP-IDF framework: lower-level task management, finer granularity, but this can lead to problems if misconfigured.

Add debug messages

To identify the cause, add a debug message in the void loop() function where the cause may be.

The current void loop() function and the functions called within loop() are as follows.

C++

 // Execute the process during Wi-Fi connection 
 void executeWiFiConnectedRoutines()  {
   static unsigned long lastOneMinutesEventUnixTime = getNowUnixTime(); // Record event execution time every minute 
 
  // device shadow state monitoring 
  SyncShadow::getInstance().loop(); 
 
  // apply device configuration changes 
   applyAndReportConfigUpdates(); 
 
  // run once per minute 
   if  ( (getNowUnixTime()  -  lastOneMinutesEventUnixTime) > 60)  {
     executeOneMinutesAction(); 
    lastOneMinutesEventUnixTime = getNowUnixTime(); 
   } 
 } 
 
 void loop()  {
   if (inSafeMode)  {
     safeModeLoop(); 
     return; 
   } 
 
   monitorWiFiConnectionChange(); 
 
   if (isWiFiConnected())  {
     executeWiFiConnectedRoutines(); }
   } 
 
   buttonManager.handleButtonPress(); }
 
  ExpressionService::getInstance().render(); }
 
   delay(10); 
 }

Use the uxTaskGetStackHighWaterMark(NULL) function to get how much stack space the current task is not using (amount of unused stack) and output that value to the serial port.

uxTaskGetStackHighWaterMark(NULL)

  • This function obtains the “High Water Mark” of the stack of the specified task. The “High Water Mark” is the maximum amount of stack used by the task so far. In other words, it returns the amount of stack remaining when the stack is at its lowest. This can be used to determine how much stack a task has actually used.
  • Passing NULL as an argument retrieves the high water mark for the currently executing task

Therefore, add the above debug messages before and after each process that is likely to reduce the amount of unused stack.

C++

 // Execute the process during Wi-Fi connection 
 void executeWiFiConnectedRoutines()  {
   static unsigned long lastOneMinutesEventUnixTime = getNowUnixTime(); // Record event execution time every minute 
  // add debug message 
   uint32_t freeStack =  uxTaskGetStackHighWaterMark (NULL); 
   Serial.print("Free stack at executeWiFiConnectedRoutines start: "); 
   Serial.println(freeStack); 
  // Device shadow state monitoring 
  SyncShadow::getInstance().loop(); 
  // add debug message 
  freeStack =  uxTaskGetStackHighWaterMark (NULL); 
   Serial.print("Free stack after SyncShadow loop: "); 
   Serial.println(freeStack); 
 

  // Apply device configuration changes 
   applyAndReportConfigUpdates(); 
  // add debug messages 
  freeStack =  uxTaskGetStackHighWaterMark (NULL); 
   Serial.print("Free stack after applyAndReportConfigUpdates: "); 
   Serial.println(freeStack); 
 
  // Run once per minute 
   if  ( (getNowUnixTime()  -  lastOneMinutesEventUnixTime) > 60)  {
     executeOneMinutesAction(); 
    lastOneMinutesEventUnixTime = getNowUnixTime(); 
   } 
  // add debug message 
  freeStack =  uxTaskGetStackHighWaterMark (NULL); 
   Serial.print("Free stack at executeWiFiConnectedRoutines end: "); 
   Serial.println(freeStack); 
 } 
 
 void loop()  {
  // Add debug message to check stack usage 
   uint32_t freeStack =  uxTaskGetStackHighWaterMark (NULL); 
   Serial.print("Free stack at loop start: "); 
   Serial.println(freeStack); 
 
   if (inSafeMode)  {
     safeModeLoop(); 
     return; 
   } 
 
   monitorWiFiConnectionChange(); 
 
   if (isWiFiConnected())  {
    // add debug message 
    freeStack =  uxTaskGetStackHighWaterMark (NULL); 
     Serial.print("Free stack before executeWiFiConnectedRoutines: "); 
     Serial.println(freeStack); 
 
     executeWiFiConnectedRoutines(); 
 
    // Add a debug message 
    freeStack =  uxTaskGetStackHighWaterMark (NULL); 
     Serial.print("Free stack after executeWiFiConnectedRoutines: "); 
     Serial.println(freeStack); 
   } 
 
   buttonManager.handleButtonPress(); 
 
  // add debug message 
  freeStack =  uxTaskGetStackHighWaterMark (NULL); 
   Serial.print("Free stack before ExpressionService: "); 
   Serial.println(freeStack); 
 
  ExpressionService::getInstance().render(); 
 
  // add debug message 
  freeStack =  uxTaskGetStackHighWaterMark (NULL); 
   Serial.print("Free stack after ExpressionService: "); 
   Serial.println(freeStack); 
 
   delay(10); 
 
  // add debug message 
  freeStack =  uxTaskGetStackHighWaterMark (NULL); 
   Serial.print("Free stack at loop end: "); 
   Serial.println(freeStack); 
 } 
 

Build again and look at the log.

Results of debug messages

The debug message indicates that only 1916 bytes of stack remain. Perhaps the stack ran out when the MQTTPubSubClient::onMessage function was used.

The large stack usage for MQTT message subscribe and publish operations and message processing ( onMessage function) was the direct cause of the stack overflow.

ShellScript
MQTTPubSubClient::subscribe: $aws/things/XXXXXXXXXXXXX/shadow/get/accepted 
 [  6987][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 71 bytes...
 MQTTPubSubClient::subscribe: $aws/things/XXXXXXXXXXXXXXX/shadow/update/delta 
 [  7064][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 71 bytes...
 MQTTPubSubClient::publish: $aws/things/XXXXXXXXXXXXX/shadow/get
[   7135 ][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 59 bytes...
 setup done 
 Free stack at loop start: 1916 
 start initialize service after  wifi  setup 
 SyncShadow is already started 
 Free stack before executeWiFiConnectedRoutines: 1916 
 Free  stack  at executeWiFiConnectedRoutines start: 1916 
 Free  stack  after SyncShadow loop: 1916 
 Free  stack  after applyAndReportConfigUpdates: 1916 
 Free  stack  at executeWiFiConnectedRoutines end: 1916 
 Free  stack  after executeWiFiConnectedRoutines: 1916 
 Free  stack  before ExpressionService: 1916 
 Free  stack  after ExpressionService:  1916 
 Free stack at loop end: 1916 
 Free  stack  at  loop  start: 1916 
 Free  stack  before executeWiFiConnectedRoutines: 1916 
 Free stack at executeWiFiConnectedRoutines start: 1916 
 MQTTPubSubClient::onMessage: $aws/things/XXXXXXXXXXXXXXX/shadow/get/accepted

 ***ERROR*** A stack overflow in task loopTask has been detected. 
 

 Backtrace:  0x400829fe :0x3ffbf190  0x40168fb5 :0x3ffbf1b0  0x4008a3d6 :0x3ffbf1d0  0x4008c626 :0x3ffbf250  0x4008a54c :0x3ffbf270  0x4008a 4fe :0x 000000ff |<-CORRUPTED

Change stack size setting in sdkconfig file

So, try increasing the stack size as a solution: open the sdkconfig file in the project root of ESP-IDF and find the following setting to increase the stack size from 8192 to 16384.

C++
// sdkconfig.default 
 // before 
 CONFIG_ARDUINO_LOOP_STACK_SIZE=8192 
 // after 
 CONFIG_ARDUINO_LOOP_STACK_SIZE=16384

By the way, someone else encountered the same stack overflow after enabling MQTT.

https://esp32.com/viewtopic.php?t=17104

In the answer in the URL above, the Main Task Size (CONFIG_MAIN_TASK_STACK_SIZE ) is increased, but this time CONFIG_ARDUINO_LOOP_STACK_SIZE is changed.

Because in this case, the following platform.ini settings are made to use both Arduino and ESP-IDF libraries and functions, and CONFIG_AUTOSTART_ARDUINO is defined so that the Arduino environment is automatically started and the setup() and loop() functions are automatically called. So the main loop task of Arduino ( loop() function ) is running separately from the main task of ESP-IDF. And this time, there is a stack overflow in the Arduino’s loop() function.

C++
[ env] 
 platform = espressif32 @ 6.7.0 
 framework = arduino, espidf

The result of the re-build,,

The stack overflow error after MQTT communication was successfully resolved!

The log shows that even at the end of the loop() function, as shown below, the remaining stack had increased to 7356 bytes, confirming that there was sufficient room. With the original 8192 bytes, 16384-7356=9028 > 8192, indicating that there was a stack overflow.

ShellScript
Free stack at loop start:  7356 
 Free stack before executeWiFiConnectedRoutines:  7356
 Free  stack  at executeWiFiConnectedRoutines start:  7356
 Free  stack  after SyncShadow loop:  7356
 Free  stack  after applyAndReportConfigUpdates:  7356
 Free  stack  at executeWiFiConnectedRoutines end:  7356
 Free  stack  after executeWiFiConnectedRoutines:  7356
 Free  stack  before ExpressionService:  7356
 Free  stack  after ExpressionService:   7356
 Free stack at loop end:  7356

This successfully solved the problem, but to confirm the cause, we will add debugging code to check the stack usage in the onMessage function of MQTT.

C++<span role="button" tabindex="0" data-code="void MqttClient::onMessage(const std::function
void MqttClient::onMessage(const std::function<void(const String &topic, const String &payload)> &callback)  { 
   client->onMessage([callback](const String &topic, const String &payload)  {
    // Get the high water mark of the stack before the callback 
     uint32_t freeStackBefore =  uxTaskGetStackHighWaterMark (NULL); 
     Serial.print("Free stack before onMessage: "); 
     Serial.println(freeStackBefore); 
 
     Serial.println("MQTTPubSubClient::onMessage: "  +  topic + " "  +  payload); 
 
    // Execute callback function 
     callback(topic, payload); 
 
    // Get high water mark on stack after callback 
     uint32_t freeStackAfter =  uxTaskGetStackHighWaterMark (NULL); 
     Serial.print("Free stack after onMessage: "); 
     Serial.println(freeStackAfter); 
 
    // Calculate consumed stack size 
     uint32_t stackUsed = freeStackBefore - freeStackAfter; 
     Serial.print("Stack used by onMessage: "); 
     Serial.println(stackUsed); 
   }); 
 }

The following results were obtained from the build: 2736 bytes were consumed by the onMessage function in the MQTT communication.

So, again, CONFIG_ARDUINO_LOOP_STACK_SIZE=8192 was not sufficient.

ShellScript
Free stack before onMessage: 10100 
 MQTTPubSubClient::onMessage: $aws/things/XXXXXXXX/shadow/get/accepted 
 preload phrases after initial sync 
 SELECT  *  FROM phrase WHERE phrase_type  =  ?  AND kind  =  'other' ORDER BY talk_count,  RANDOM () LIMIT ? ;, Args: natural, 10,
Matched: 10, Time taken:276963 
 
 Free stack after onMessage: 7364 
 Stack used by on Message:  2736 
 MQTTPubSubClient::publish: $aws/things/XXXXXXXX/shadow/update  { "state":{"reported":{"config":{"talk_frequency":60,"weather_announcement_time":{"hour":8,"minute":0},"birth_date&quot;:{},"phrase_type":"natural","talk_start_time":{"hour":7,"minute":0},"talk_end_time":{"hour":22,"minute":0},"work_start_time":{},"work_end_time":{},"volume":50,"firmware_version":"1.0.0"},"wakeup_time":25,"last_action_time":1717276989,&quot;downloading":{"status":"NORMAL","progress":0,"start_time":1717234673,"end_time& quot; :1717235067,"type":"voice"},"connected":true,"last_periodic_talk_time":1716968737}} 
 [ 59654][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 63 bytes...
 [  59667 ][V] [ ssl_client.cpp:36 9 ] send_ssl_data(): Writing HTTP request with 512 bytes...
 Free stack at loop end: 7364

I’m glad we found the cause and solved the problem.

Copied title and URL