Random Vector Accumulator

  • Giving a weight to title-entity index vector results in better clustering of similar concepts. Comparison of the first 1000 embeddings with weight=0 and weight=3.figure_1figure_2
  • Using a faster random number generating method, 5000 dimensional vectors with window size 2, running on 8 cores took 1623 seconds on 1,24,87,761 words (0.67% of the corpus). This is the plot of the first 4000 entities.figure_1-2
  • Zooming in on a few clusters.figure_1-1figure_1-9figure_1-6.png
  • Still haven’t been able to solve the issue of cpu usage.
  • It takes word2vec 18 seconds to generate 2000 dimensional embeddings on 1,10,922 words while the RVA takes 10 seconds, both running on 8 cores. At least in terms of the dimension of the embeddings, the RVA scales better in terms of processing time.
  • Trying to get the code to execute using Cython or PyPy to improve the run time.
    • When trying to execute in cython I always get ​”[Errno 21] Is a directory”, thought the code runs fine using python.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s