Tried running RVA on a single processor and it processed 0.67% corpus in 25 seconds. But runs out of memory as the corpus size grows.
Using the python module shelve to store and access the index dictionary from the disk. RVA_single.py uses only one processor and takes 180 seconds, which is faster than Word2Vec, which takes 200 seconds when run on 8 processors.
The code is currently processing the whole corpus. Though I think the difference in time between RVA_single.py and RVA.py. is due to the multiprocessing proxy dict being slower than the regular python dictionary.