I have a question to NLP in R. My data is very big and so I need to reduce my data for further analysis to apply a SVM on it.
I have a Document-Term-Matrix like this:
Document WordY WordZ WordV WordU WordZZ
1 0 0 0 1 0
2 0 2 1 2 0
3 0 0 1 1 0
So in this example I would like to reduce the dataframe by column WordY and WordZZ because this columns have no specific meaning for this dataframe. Is this possible to remove all column with only zero values with one specific order? My problem is that my dataframe is too huge to delete every specific column with one order. Its something about 4.0000.0000 columns in the dataframe.
Thank you in Advance guys. Cheers, Tom
Using colSums():
df[, colSums(abs(df)) > 0]
i.e. a column has only zeros if and only if the sum of the absolute values is zero.
Here is how I would do it:
dplyr::select_if(YOUR_DATA, ~ any(. != 0))
Returns:
Document WordZ WordV WordU
1 1 0 0 1
2 2 2 1 2
3 3 0 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With