Out of memory: Kill process (snips-hotword)

Hi, having just released our software with Snips updated core v0.60.12 I start getting reports of memory leak problems. So far two systems crashed within 36h up-time. Previously release with 0.60.1 was stable and had no memory issues
Please advice how to troubleshoot this issue!

System: revPI Core 3, Processor BCM2837, 1.2 GHz, quad-core RAM 1 GByte

kern.log:
Feb 7 05:23:45 RevPi7849 kernel: [145742.843784] NOHZ: local_softirq_pending 80
Feb 7 05:53:19 RevPi7849 kernel: [147517.139942] NOHZ: local_softirq_pending 80
Feb 7 06:45:27 RevPi7849 kernel: [150645.480896] NOHZ: local_softirq_pending 80
Feb 7 10:08:52 RevPi7849 kernel: [162849.633129] NOHZ: local_softirq_pending 80
Feb 7 14:13:37 RevPi7849 kernel: [177535.212320] smsc95xx 1-1.1:1.0 eth0: link down
Feb 7 14:13:42 RevPi7849 kernel: [177537.236654] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
Feb 7 14:18:29 RevPi7849 kernel: [177827.166335] apt-key invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), nodemask=0, order=2, oom_score_adj=0
Feb 7 14:18:30 RevPi7849 kernel: [177827.166343] apt-key cpuset=/ mems_allowed=0
Feb 7 14:18:30 RevPi7849 kernel: [177827.166360] CPU: 0 PID: 26310 Comm: apt-key Tainted: G O 4.9.76-rt60-v7+ #1
Feb 7 14:18:30 RevPi7849 kernel: [177827.166361] Hardware name: BCM2835
Feb 7 14:18:30 RevPi7849 kernel: [177827.166381] [<8010fe8c>] (unwind_backtrace) from [<8010bf30>] (show_stack+0x20/0x24)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166389] [<8010bf30>] (show_stack) from [<804697b8>] (dump_stack+0xc8/0x114)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166399] [<804697b8>] (dump_stack) from [<8027906c>] (dump_header+0x94/0x1f8)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166409] [<8027906c>] (dump_header) from [<8021cd48>] (oom_kill_process+0x3a4/0x4d4)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166417] [<8021cd48>] (oom_kill_process) from [<8021d1a8>] (out_of_memory+0x110/0x33c)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166425] [<8021d1a8>] (out_of_memory) from [<80222734>] (__alloc_pages_nodemask+0xda4/0xe9c)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166433] [<80222734>] (__alloc_pages_nodemask) from [<8022284c>] (__get_free_pages+0x20/0x70)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166441] [<8022284c>] (__get_free_pages) from [<80117078>] (pgd_alloc+0x24/0x11c)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166449] [<80117078>] (pgd_alloc) from [<80119ec0>] (mm_init+0x124/0x1d4)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166457] [<80119ec0>] (mm_init) from [<8011a630>] (mm_alloc+0x54/0x68)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166466] [<8011a630>] (mm_alloc) from [<80285f44>] (do_execveat_common+0x240/0x718)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166475] [<80285f44>] (do_execveat_common) from [<80286454>] (do_execve+0x38/0x40)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166482] [<80286454>] (do_execve) from [<8028669c>] (SyS_execve+0x2c/0x30)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166491] [<8028669c>] (SyS_execve) from [<80107d80>] (ret_fast_syscall+0x0/0x1c)
Feb 7 14:18:30 RevPi7849 kernel: [177827.166493] Mem-Info:
Feb 7 14:18:30 RevPi7849 kernel: [177827.166507] active_anon:210936 inactive_anon:9997 isolated_anon:0
Feb 7 14:18:30 RevPi7849 kernel: [177827.166507] active_file:473 inactive_file:495 isolated_file:32
Feb 7 14:18:30 RevPi7849 kernel: [177827.166507] unevictable:0 dirty:2 writeback:0 unstable:0
Feb 7 14:18:30 RevPi7849 kernel: [177827.166507] slab_reclaimable:1989 slab_unreclaimable:2786
Feb 7 14:18:30 RevPi7849 kernel: [177827.166507] mapped:4433 shmem:10986 pagetables:1231 bounce:0
Feb 7 14:18:30 RevPi7849 kernel: [177827.166507] free:4901 free_pcp:334 free_cma:69
Feb 7 14:18:30 RevPi7849 kernel: [177827.166518] Node 0 active_anon:843744kB inactive_anon:39988kB active_file:1992kB inactive_file:1980kB unevictable:0kB isolated(anon):0kB isolated(file):128kB mapped:17732kB dirty:8kB writeback:0kB shmem:43944kB writeback_tmp:0kB unstable:0kB pages_scanned:13550 all_unreclaimable? yes
Feb 7 14:18:30 RevPi7849 kernel: [177827.166531] Normal free:19604kB min:16384kB low:20480kB high:24576kB active_anon:843744kB inactive_anon:39988kB active_file:1864kB inactive_file:1980kB unevictable:0kB writepending:8kB present:970752kB managed:949036kB mlocked:0kB slab_reclaimable:7956kB slab_unreclaimable:11144kB kernel_stack:2424kB pagetables:4924kB bounce:0kB free_pcp:1360kB local_pcp:224kB free_cma:276kB
Feb 7 14:18:30 RevPi7849 kernel: lowmem_reserve[]: 0 0
Feb 7 14:18:30 RevPi7849 kernel: [177827.166545] Normal: 1934kB (MEHC) 2168kB (UMEHC) 26916kB (UMEHC) 10832kB (UMEHC) 5064kB (UMEH) 20128kB (UMEHC) 8256kB (UMEH) 1512kB (M) 11024kB (H) 02048kB 0*4096kB = 19604kB
Feb 7 14:18:30 RevPi7849 kernel: 12033 total pagecache pages
Feb 7 14:18:30 RevPi7849 kernel: [177827.166654] 0 pages in swap cache
Feb 7 14:18:30 RevPi7849 kernel: [177827.166659] Swap cache stats: add 0, delete 0, find 0/0
Feb 7 14:18:30 RevPi7849 kernel: [177827.166662] Free swap = 0kB
Feb 7 14:18:30 RevPi7849 kernel: [177827.166665] Total swap = 0kB
Feb 7 14:18:30 RevPi7849 kernel: [177827.166669] 242688 pages RAM
Feb 7 14:18:30 RevPi7849 kernel: [177827.166672] 0 pages HighMem/MovableOnly
Feb 7 14:18:30 RevPi7849 kernel: [177827.166675] 5429 pages reserved
Feb 7 14:18:30 RevPi7849 kernel: [177827.166678] 2048 pages cma reserved
Feb 7 14:18:30 RevPi7849 kernel: [177827.166682] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Feb 7 14:18:30 RevPi7849 kernel: [177827.166722] [ 141] 0 141 8778 2209 16 0 0 0 systemd-journal
Feb 7 14:18:30 RevPi7849 kernel: [177827.166732] [ 172] 0 172 3711 277 6 0 0 -1000 systemd-udevd
Feb 7 14:18:30 RevPi7849 kernel: [177827.166741] [ 1260] 100 1260 4321 110 9 0 0 0 systemd-timesyn
Feb 7 14:18:30 RevPi7849 kernel: [177827.166749] [ 1311] 108 1311 1633 102 8 0 0 0 avahi-daemon
Feb 7 14:18:30 RevPi7849 kernel: [177827.166757] [ 1312] 65534 1312 1324 60 7 0 0 0 thd
Feb 7 14:18:30 RevPi7849 kernel: [177827.166765] [ 1316] 0 1316 1331 61 6 0 0 0 cron
Feb 7 14:18:30 RevPi7849 kernel: [177827.166773] [ 1317] 0 1317 5937 204 10 0 0 0 rsyslogd
Feb 7 14:18:30 RevPi7849 kernel: [177827.166781] [ 1321] 0 1321 1854 106 7 0 0 0 systemd-logind
Feb 7 14:18:30 RevPi7849 kernel: [177827.166789] [ 1322] 105 1322 1628 102 8 0 0 -900 dbus-daemon
Feb 7 14:18:30 RevPi7849 kernel: [177827.166797] [ 1327] 108 1327 1601 77 6 0 0 0 avahi-daemon
Feb 7 14:18:30 RevPi7849 kernel: [177827.166805] [ 1355] 0 1355 721 86 6 0 0 0 dhcpcd
Feb 7 14:18:30 RevPi7849 kernel: [177827.166812] [ 1365] 112 1365 2298 809 8 0 0 0 mosquitto
Feb 7 14:18:30 RevPi7849 kernel: [177827.166820] [ 1422] 0 1422 475 16 4 0 0 0 boot.sh
Feb 7 14:18:30 RevPi7849 kernel: [177827.166828] [ 1466] 0 1466 8797 5507 22 0 0 0 python
Feb 7 14:18:30 RevPi7849 kernel: [177827.166836] [ 1489] 0 1489 2553 137 8 0 0 -1000 sshd
Feb 7 14:18:30 RevPi7849 kernel: [177827.166844] [ 1506] 0 1506 30721 2795 29 0 0 0 apache2
Feb 7 14:18:30 RevPi7849 kernel: [177827.166852] [ 1529] 0 1529 42758 13936 128 0 0 0 node-red
Feb 7 14:18:30 RevPi7849 kernel: [177827.166860] [ 1538] 0 1538 1049 31 6 0 0 0 agetty
Feb 7 14:18:30 RevPi7849 kernel: [177827.166867] [ 1628] 0 1628 5220 3616 13 0 0 0 miniserv.pl
Feb 7 14:18:30 RevPi7849 kernel: [177827.166875] [16795] 0 16795 3178 61 6 0 0 0 gpg-agent
Feb 7 14:18:30 RevPi7849 kernel: [177827.166883] [ 3787] 33 3787 30737 822 23 0 0 0 apache2
Feb 7 14:18:30 RevPi7849 kernel: [177827.166890] [ 3788] 33 3788 30737 822 23 0 0 0 apache2
Feb 7 14:18:30 RevPi7849 kernel: [177827.166898] [ 3789] 33 3789 30727 816 23 0 0 0 apache2
Feb 7 14:18:30 RevPi7849 kernel: [177827.166906] [ 3790] 33 3790 30727 816 23 0 0 0 apache2
Feb 7 14:18:30 RevPi7849 kernel: [177827.166914] [ 3791] 33 3791 30727 816 23 0 0 0 apache2
Feb 7 14:18:30 RevPi7849 kernel: [177827.166922] [ 9642] 0 9642 104818 28248 91 0 0 0 java
Feb 7 14:18:30 RevPi7849 kernel: [177827.166930] [ 9876] 0 9876 7885 4483 20 0 0 0 python3
Feb 7 14:18:30 RevPi7849 kernel: [177827.166938] [ 9878] 111 9878 6486 711 15 0 0 0 snips-dialogue
Feb 7 14:18:30 RevPi7849 kernel: [177827.166945] [ 9881] 111 9881 30189 21713 61 0 0 0 snips-nlu
Feb 7 14:18:30 RevPi7849 kernel: [177827.166953] [ 9884] 111 9884 17363 5350 36 0 0 0 snips-asr
Feb 7 14:18:30 RevPi7849 kernel: [177827.166961] [ 9885] 111 9885 114317 102267 227 0 0 0 snips-hotword
Feb 7 14:18:30 RevPi7849 kernel: [177827.166968] [24798] 0 24798 35739 4022 67 0 0 0 homebridge
Feb 7 14:18:30 RevPi7849 kernel: [177827.166986] [26208] 0 26208 7951 6303 18 0 0 0 /usr/share/webm
Feb 7 14:18:30 RevPi7849 kernel: [177827.166994] [26229] 0 26229 475 16 5 0 0 0 sh
Feb 7 14:18:30 RevPi7849 kernel: [177827.167002] [26230] 0 26230 1876 150 8 0 0 0 apt-get
Feb 7 14:18:30 RevPi7849 kernel: [177827.167010] [26234] 104 26234 9307 3178 20 0 0 0 https
Feb 7 14:18:30 RevPi7849 kernel: [177827.167017] [26235] 104 26235 1797 94 9 0 0 0 http
Feb 7 14:18:30 RevPi7849 kernel: [177827.167025] [26236] 104 26236 1797 92 7 0 0 0 http
Feb 7 14:18:30 RevPi7849 kernel: [177827.167033] [26237] 104 26237 9306 3176 20 0 0 0 https
Feb 7 14:18:30 RevPi7849 kernel: [177827.167041] [26238] 104 26238 9307 3181 20 0 0 0 https
Feb 7 14:18:30 RevPi7849 kernel: [177827.167049] [26239] 104 26239 1797 102 7 0 0 0 http
Feb 7 14:18:30 RevPi7849 kernel: [177827.167056] [26244] 104 26244 1715 82 8 0 0 0 gpgv
Feb 7 14:18:30 RevPi7849 kernel: [177827.167065] [26245] 104 26245 1715 82 8 0 0 0 gpgv
Feb 7 14:18:30 RevPi7849 kernel: [177827.167073] [26246] 104 26246 475 36 6 0 0 0 apt-key
Feb 7 14:18:30 RevPi7849 kernel: [177827.167080] [26276] 0 26276 5220 3617 13 0 0 0 /usr/share/webm
Feb 7 14:18:30 RevPi7849 kernel: [177827.167088] [26283] 0 26283 5220 3614 13 0 0 0 /usr/share/webm
Feb 7 14:18:30 RevPi7849 kernel: [177827.167112] [26286] 0 26286 1592 77 7 0 0 0 cron
Feb 7 14:18:30 RevPi7849 kernel: [177827.167120] [26290] 33 26290 30727 815 23 0 0 0 apache2
Feb 7 14:18:30 RevPi7849 kernel: [177827.167128] [26291] 0 26291 5220 3614 13 0 0 0 /usr/share/webm
Feb 7 14:18:30 RevPi7849 kernel: [177827.167136] [26292] 0 26292 2454 75 7 0 0 0 sshd
Feb 7 14:18:30 RevPi7849 kernel: [177827.167144] [26295] 0 26295 475 36 4 0 0 0 dhcpcd-run-hook
Feb 7 14:18:30 RevPi7849 kernel: [177827.167152] [26296] 33 26296 30727 816 23 0 0 0 apache2
Feb 7 14:18:30 RevPi7849 kernel: [177827.167160] [26297] 0 26297 2454 78 8 0 0 0 sshd
Feb 7 14:18:30 RevPi7849 kernel: [177827.167167] [26298] 0 26298 475 31 4 0 0 0 resolvconf
Feb 7 14:18:30 RevPi7849 kernel: [177827.167175] [26303] 0 26303 1450 101 6 0 0 0 cron
Feb 7 14:18:30 RevPi7849 kernel: [177827.167183] [26304] 0 26304 681 56 5 0 0 0 mkdir
Feb 7 14:18:30 RevPi7849 kernel: [177827.167192] [26309] 0 26309 253 1 3 0 0 0 sshd
Feb 7 14:18:30 RevPi7849 kernel: [177827.167199] [26310] 104 26310 475 37 4 0 0 0 apt-key
Feb 7 14:18:30 RevPi7849 kernel: [177827.167205] Out of memory: Kill process 9885 (snips-hotword) score 431 or sacrifice child
Feb 7 14:18:30 RevPi7849 kernel: [177827.167249] Killed process 9885 (snips-hotword) total-vm:457268kB, anon-rss:409068kB, file-rss:0kB, shmem-rss:0kB

hello @jens , you are running out of memory.

The OOM Killer is a function of the linux kernel that kill rogue processes that are requesting more memory that the OS can allocate so that the system can survive. The function applies some heuristics (it gives each process a score) to decide which process to kill when the system is in such state.

with apache and node running … I am not that surprised .

if you install only the snips-hotword on your pi , you will see that there is no leak …

hello, @marc. I solved the problem by restarting hotword regularly (every hour).
I can’t see how this is related to apache and node, but maybe you know something I don’t?
As I said, the same setup used to run for weeks with 0.60.1

Hey @jens, we had a closer look with @marc at the OOM output, and it looks that snips-hotword may indeed be misbehaving.

Feb 7 14:18:30 RevPi7849 kernel: [177827.166682] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Feb 7 14:18:30 RevPi7849 kernel: [177827.166961] [ 9885] 111 9885 114317 102267 227 0 0 0 snips-hotword

rss at 102267 is ~102MB, that’s not expected. Could you give us a few details about your setup ? Are you using a Snips model or a personal hotword ? Is your hotword listening to more than one input ?

Could you grep your syslog for snips-hotword output ? Maybe run in with verbose flag ? Could you check if the memory consumption grows up continuously from the startup instant to the OOM kill or if it jumps somehow at some point ?

Hello and thank you for the feedback. I will get back ASAP.

Update: I have run automated testing all weekend with memory logging, but I cannot reproduce the issue. I’m running the same SW versions and the same HW, so I assume it is a different configuration.

I’ll keep an eye open, and get back to you when it happens again.

Could you grep your syslog for snips-hotword output ? Maybe run in with verbose flag ?
-> cannot reproduce, system has been erased.

Could you check if the memory consumption grows up continuously from the startup instant to the OOM kill or if it jumps somehow at some point ?

-> I was watching the memory usage growing continuously (using htop), as I can recall, the resident memory after restart was at 1M and grew with ~10M / h. There’s was no obvious link to wake-up or other activity, it just grew.

With the release of 0.61.1 the hotword memory issue that I saw from time to time seems to be solved.

There’s undoubtedly a memory usage problem, so I’ve filed an issue https://github.com/snipsco/snips-issues/issues/126