Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all columns or rows with only zeros out of a data frame

I have a question to NLP in R. My data is very big and so I need to reduce my data for further analysis to apply a SVM on it.

I have a Document-Term-Matrix like this:

Document WordY WordZ WordV WordU WordZZ
1        0     0     0     1     0
2        0     2     1     2     0
3        0     0     1     1     0

So in this example I would like to reduce the dataframe by column WordY and WordZZ because this columns have no specific meaning for this dataframe. Is this possible to remove all column with only zero values with one specific order? My problem is that my dataframe is too huge to delete every specific column with one order. Its something about 4.0000.0000 columns in the dataframe.

Thank you in Advance guys. Cheers, Tom

like image 897
Sylababa Avatar asked Dec 08 '25 14:12

Sylababa


2 Answers

Using colSums():

df[, colSums(abs(df)) > 0]

i.e. a column has only zeros if and only if the sum of the absolute values is zero.

like image 76
VitaminB16 Avatar answered Dec 10 '25 04:12

VitaminB16


Here is how I would do it:

dplyr::select_if(YOUR_DATA, ~ any(. != 0))

Returns:

  Document WordZ WordV WordU
1        1     0     0     1
2        2     2     1     2
3        3     0     1     1
like image 34
ktiu Avatar answered Dec 10 '25 03:12

ktiu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!