I am working on node2vec in Python, which uses Gensim's Word2Vec
internally.
When I am using a small dataset, the code works well. But as soon as I try to run the same code on a large dataset, the code crashes:
Error: Process finished with exit code 134 (interrupted by signal 6: SIGABRT).
The line which is giving the error is
model = Word2Vec(walks, size=args.dimensions,
window=args.window_size, min_count=0, sg=1,
workers=args.workers, iter=args.iter)
I am using PyCharm and Python 3.5.
What is happening? I could not find any post which could solve my problem.
You are almost certainly running out of memory – which causes the OS to abort your memory-using process with the SIGABRT
.
In general, solving this means looking at how your code is using memory, leading up to and at the moment of failure. (The actual 'leak' of excessive bulk memory usage might, however, be arbitrarily earlier - with only the last small/proper increment triggering the error.)
Specifically with the usage of Python, and the node2vec
tool which makes use of the Gensim Word2Vec
class, some things to try include:
Watch a readout of the Python process size during your attempts.
Enable Python logging to at least the INFO
level to see more about what's happening leading-up to the crash.
Further, be sure to:
walks
iterable to not compose a large in-memory list. (Gensim's Word2Vec
can work on a corpus of any length, iuncluding those far larger than RAM, as long as (a) the corpus is streamed from disk via a re-iterable Python sequence; and (b) the model's number of unique word/node tokens can be modeled within RAM.)min_count
value to discard more less-important nodes.)If your Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
error does not involve Python, Gensim, & Word2Vec
, you should instead:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With