Word2Vec: Number of Dimensions

Tags:

I am using Word2Vec with a dataset of roughly 11,000,000 tokens looking to do both word similarity (as part of synonym extraction for a downstream task) but I don't have a good sense of how many dimensions I should use with Word2Vec. Does anyone have a good heuristic for the range of dimensions to consider based on the number of tokens/sentences?

293

asked Oct 26 '14 02:10

Vin Diesel

1 Answers

Typical interval is between 100-300. I would say you need at least 50D to achieve lowest accuracy. If you pick lesser number of dimensions, you will start to lose properties of high dimensional spaces. If training time is not a big deal for your application, i would stick with 200D dimensions as it gives nice features. Extreme accuracy can be obtained with 300D. After 300D word features won't improve dramatically, and training will be extremely slow.

I do not know theoretical explanation and strict bounds of dimension selection in high dimensional spaces (and there might not a application-independent explanation for that), but I would refer you to Pennington et. al, Figure2a where x axis shows vector dimension and y axis shows the accuracy obtained. That should provide empirical justification to above argument.

answered Jan 04 '23 01:01

Cylonmath

Related questions
                            
                                Swipe and OnClick events in RecyclerView
                            
                                Accessing Excel file from Sharepoint with R
                            
                                Using rvest or httr to log in to non-standard forms on a webpage
                            
                                How to get breakpoint in NDK native code and debug native code in Android Studio?
                            
                                Properties vs. Keys vs. Values in JavaScript
                            
                                Bare words / new keywords in Python
                            
                                Avoiding code repetition in default arguments in Python
                            
                                How to call Haskell from Javascript with GHCJS
                            
                                What's the meaning of the '&' character in the returned value?
                            
                                Is the most significant decimal digits precision that can be converted to binary and back to decimal without loss of significance 6 or 7.225?
                            
                                scikit-learn: Random forest class_weight and sample_weight parameters
                            
                                What's the equivalent of std::is_const for references to const?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With