Comparison between fasttext and LDA

Tags:

Hi Last week Facebook announced Fasttext which is a way to categorize words into bucket. Latent Dirichlet Allocation is also another way to do topic modeling. My question is did anyone do any comparison regarding pro and con within these 2.

I haven't tried Fasttext but here are few pro and con for LDA based on my experience

Pro

Iterative model, having support for Apache spark
Takes in a corpus of document and does topic modeling.
Not only finds out what the document is talking about but also finds out related documents
Apache spark community is continuously contributing to this. Earlier they made it work on mllib now on ml libraries

Con

Stopwords need to be defined well. They have to be related to the context of the document. For ex: "document" is a word which is having high frequency of appearance and may top the chart of recommended topics but it may or maynot be relevant, so we need to update the stopword for that
Sometime classification might be irrelevant. In the below example it is hard to infer what this bucket is talking about

Topic:

Term:discipline
Term:disciplines
Term:notestable
Term:winning
Term:pathways
Term:chapterclosingtable
Term:metaprograms
Term:breakthroughs
Term:distinctions
Term:rescue

If anyone has done research in Fasttext can you please update with your learning?

875

asked Aug 22 '16 04:08

Nabs

1 Answers

fastText offers more than topic modelling, it is a tool for generation of word embeddings and text classification using a shallow neural network. The authors state its performance is comparable with much more complex “deep learning” algorithms, but the training time is significantly lower.

Pros:

=> It is extremely easy to train your own fastText model,

$ ./fasttext skipgram -input data.txt -output model

Just provide your input and output file, the architecture to be used and that's all, but if you wish to customize your model a bit, fastText provides the option to change the hyper-parameters as well.

=> While generating word vectors, fastText takes into account sub-parts of words called character n-grams so that similar words have similar vectors even if they happen to occur in different contexts. For example, “supervised”, “supervise” and “supervisor” all are assigned similar vectors.

=> A previously trained model can be used to compute word vectors for out-of-vocabulary words. This one is my favorite. Even if the vocabulary of your corpus is finite, you can get a vector for almost any word that exists in the world.

=> fastText also provides the option to generate vectors for paragraphs or sentences. Similar documents can be found by comparing the vectors of documents.

=> The option to predict likely labels for a piece of text has been included too.

=> Pre-trained word vectors for about 90 languages trained on Wikipedia are available in the official repo.

Cons:

=> As fastText is command line based, I struggled while incorporating this into my project, this might not be an issue to others though.

=> No in-built method to find similar words or paragraphs.

For those who wish to read more, here are the links to the official research papers:

1) https://arxiv.org/pdf/1607.04606.pdf

2) https://arxiv.org/pdf/1607.01759.pdf

And link to the official repo:

https://github.com/facebookresearch/fastText

answered Oct 19 '22 18:10

Aanchal1103

Related questions
                            
                                React Native - how to inspect the UI/ elements?
                            
                                How to logout user using Facebook authentication using Swift and iOS?
                            
                                FirebaseUI Auth - Facebook Login error : Unsuccessful debug_token response from Facebook
                            
                                Custom URL to launch Facebook Messenger on iOS
                            
                                Error on application ID of facebook app in Manifest
                            
                                Facebook PHP SDK: getting "long-lived" access token now that "offline_access" is deprecated
                            
                                How to get facebook profile picture of user in facebook SDK 3.0 Android
                            
                                New Facebook API 3.0. and ActionBarSherlock compatibility
                            
                                How to get Facebook profile image in Android
                            
                                og:description not showing up in Facebook feed
                            
                                Testing oAuth / facebook authentication with WebTestCase in Symfony2
                            
                                iOS Facebook Status Location
                            
                                Possible to use the native facebook app to log into a mobile html5 app? On the iPhone
                            
                                Facebook-ios-sdk with embedded UIWebView
                            
                                Session.isOpened() == true but Session.getAccessToken() == ""
                            
                                Android & Facebook SDK : decoding pictures from /me/picture graph call
                            
                                Android Facebook SSO for server side authentication
                            
                                Facebook Graph API "search" by email suddenly stopped working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Comparison between fasttext and LDA

Tags:

facebook

scala

apache-spark

Nabs

People also ask

1 Answers

Aanchal1103

Recent Activity

Donate For Us