- Giving a weight to title-entity index vector results in better clustering of similar concepts. Comparison of the first 1000 embeddings with weight=0 and weight=3.
- Using a faster random number generating method, 5000 dimensional vectors with window size 2, running on 8 cores took 1623 seconds on 1,24,87,761 words (0.67% of the corpus). This is the plot of the first 4000 entities.
- Zooming in on a few clusters.
- Still haven’t been able to solve the issue of cpu usage.
- It takes word2vec 18 seconds to generate 2000 dimensional embeddings on 1,10,922 words while the RVA takes 10 seconds, both running on 8 cores. At least in terms of the dimension of the embeddings, the RVA scales better in terms of processing time.
- Trying to get the code to execute using Cython or PyPy to improve the run time.
- When trying to execute in cython I always get ”[Errno 21] Is a directory”, thought the code runs fine using python.