Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python node2vec (Gensim Word2Vec) "Process finished with exit code 134 (interrupted by signal 6: SIGABRT)"

I am working on node2vec in Python, which uses Gensim's Word2Vec internally.

When I am using a small dataset, the code works well. But as soon as I try to run the same code on a large dataset, the code crashes:

Error: Process finished with exit code 134 (interrupted by signal 6: SIGABRT).

The line which is giving the error is

model = Word2Vec(walks, size=args.dimensions,
                 window=args.window_size, min_count=0, sg=1,
                 workers=args.workers, iter=args.iter)

I am using PyCharm and Python 3.5.

What is happening? I could not find any post which could solve my problem.

like image 483
Zohaib Brohi Avatar asked Jan 16 '18 21:01

Zohaib Brohi


1 Answers

You are almost certainly running out of memory – which causes the OS to abort your memory-using process with the SIGABRT.

In general, solving this means looking at how your code is using memory, leading up to and at the moment of failure. (The actual 'leak' of excessive bulk memory usage might, however, be arbitrarily earlier - with only the last small/proper increment triggering the error.)

Specifically with the usage of Python, and the node2vec tool which makes use of the Gensim Word2Vec class, some things to try include:

Watch a readout of the Python process size during your attempts.

Enable Python logging to at least the INFO level to see more about what's happening leading-up to the crash.

Further, be sure to:

  1. Optimize your walks iterable to not compose a large in-memory list. (Gensim's Word2Vec can work on a corpus of any length, iuncluding those far larger than RAM, as long as (a) the corpus is streamed from disk via a re-iterable Python sequence; and (b) the model's number of unique word/node tokens can be modeled within RAM.)
  2. Ensure the number of unique words (tokens/nodes) in your model doesn't require a model larger than RAM allows. Logging output, once enabled, will show the raw sizes involved just before the main model-allocation (which is likely failing) happens. (If it fails, either: (a) use a system with more RAM to accomdate your full set of nodes; or (b) or use a higher min_count value to discard more less-important nodes.)

If your Process finished with exit code 134 (interrupted by signal 6: SIGABRT) error does not involve Python, Gensim, & Word2Vec, you should instead:

  1. Search for occurrences of that error combined with more specific details of your triggering situations - the tools/libraries and lines-of-code that create your error.
  2. Look into general memory-profiling tools for your situation, to identify where (even long before the final error) your code might be consuming almost-all of the available RAM.
like image 51
gojomo Avatar answered Oct 24 '22 09:10

gojomo