Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove rows from data frame based on subset function?

Tags:

r

I would like to remove some rows from my data frame. I think that using subset it will be the easiest way to do that.

I used code below to remove some of the rows before:

data_selected <- subset(tbl_data, Name.x != "XXX" & Name.y != "YYY")

The question is how to remove the rows from my table which have the same string in two cells (same row).

I think that mtcars can be used as an example:

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2

gear and carb columns can be used. As you can see two first rows should be removed from this data because both have the same value 4 in those two columns. Please take to the account that in my data I don't have numeric values but character string.

like image 857
Shaxi Liver Avatar asked Aug 14 '15 13:08

Shaxi Liver


People also ask

How do I remove rows from a DataFrame based on conditions in R?

For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).

How do I remove rows from a certain value?

Go ahead to right click selected cells and select the Delete from the right-clicking menu. And then check the Entire row option in the popping up Delete dialog box, and click the OK button. Now you will see all the cells containing the certain value are removed.

How do I remove rows from two conditions in R?

To remove rows of data from a dataframe based on multiple conditional statements. We use square brackets [ ] with the dataframe and put multiple conditional statements along with AND or OR operator inside it. This slices the dataframe and removes all the rows that do not satisfy the given conditions.


1 Answers

Based on the information in the post, I think a comparison (!=) between the 'gear' and 'carb' columns will be enough to subset the dataset

df1 <- mtcars[1:5,]
subset(df1, gear!=carb)
#                     mpg cyl disp  hp drat    wt  qsec vs am gear carb
#Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

This should also work for 'non-numeric' columns, but not for partial matches.

If we need to make an exception about keeping the rows that have both 'Unknown', we can use the | operator after adding another logical condition (`(gear=='Unknown' & carb=='Unknown')) to the original condition.

Making some changes in the dataset to show the output (just as an example, I know I am changing a numeric column to character by doing this)

 df1$gear[4] <- 'Unknown'
 df1$carb[4] <- 'Unknown'
 df1$gear[5] <- 'Unknown'


subset(df1, (gear=='Unknown' & carb=='Unknown') | gear!=carb)
#                   mpg cyl disp  hp drat    wt  qsec vs am    gear    carb
#Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1       4       1
#Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0 Unknown Unknown
#Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0 Unknown       2
like image 125
akrun Avatar answered Sep 28 '22 05:09

akrun