When I run my tensorflow app, it just outputs "killed". How do I debug this?
source code
root@8e4a3a65184e:~/tensorflow# python sample_cnn.py
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_tf_random_seed': 1, '_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_steps': None, '_model_dir': 'data/convnet_model', '_save_summary_steps': 100}
INFO:tensorflow:Create CheckpointSaverHook.
2017-08-17 12:56:53.160481: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 12:56:53.160536: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 12:56:53.160545: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 12:56:53.160550: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 12:56:53.160555: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Killed
When I run your code I get the same behavior, after typing dmesg
you'll see a trace like, which confirms what gdelab was hinting at:
[38607.234089] python3 invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0, order=0, oom_score_adj=0
[38607.234090] python3 cpuset=/ mems_allowed=0
[38607.234094] CPU: 3 PID: 1420 Comm: python3 Tainted: G O 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[38607.234094] Hardware name: Dell Inc. XPS 15 9560/05FFDN, BIOS 1.2.4 03/29/2017
[38607.234096] 0000000000000000 ffffffffa9f28414 ffffa50090317cf8 ffff940effa5f040
[38607.234097] ffffffffa9dfe050 0000000000000000 0000000000000000 0101ffffa9d82dd0
[38607.234098] e09c7db7f06d0ac2 00000000ffffffff 0000000000000000 0000000000000000
[38607.234100] Call Trace:
[38607.234104] [<ffffffffa9f28414>] ? dump_stack+0x5c/0x78
[38607.234106] [<ffffffffa9dfe050>] ? dump_header+0x78/0x1fd
[38607.234108] [<ffffffffa9d8047a>] ? oom_kill_process+0x21a/0x3e0
[38607.234109] [<ffffffffa9d800fd>] ? oom_badness+0xed/0x170
[38607.234110] [<ffffffffa9d80911>] ? out_of_memory+0x111/0x470
[38607.234111] [<ffffffffa9d85b4f>] ? __alloc_pages_slowpath+0xb7f/0xbc0
[38607.234112] [<ffffffffa9d85d8e>] ? __alloc_pages_nodemask+0x1fe/0x260
[38607.234113] [<ffffffffa9dd7c3e>] ? alloc_pages_vma+0xae/0x260
[38607.234115] [<ffffffffa9db39ba>] ? handle_mm_fault+0x111a/0x1350
[38607.234117] [<ffffffffa9c5fd84>] ? __do_page_fault+0x2a4/0x510
[38607.234118] [<ffffffffaa207658>] ? page_fault+0x28/0x30
...
[38607.234158] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
...
[38607.234332] [ 1396] 1000 1396 4810969 3464995 6959 21 0 0 python3
[38607.234332] Out of memory: Kill process 1396 (python3) score 568 or sacrifice child
[38607.234357] Killed process 1396 (python3) total-vm:19243876kB, anon-rss:13859980kB, file-rss:0kB, shmem-rss:0kB
[38607.720757] oom_reaper: reaped process 1396 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Which basically means python was starting too consume too much memory and the kernel decided to kill the process. If you add some prints in your code you'll see that mnist_classifier.train()
is the function which is active. However some dumb tests (as removing the logging and lowering the steps, did not seem to help here).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With