I would like to remove some rows from my data frame. I think that using subset
it will be the easiest way to do that.
I used code below to remove some of the rows before:
data_selected <- subset(tbl_data, Name.x != "XXX" & Name.y != "YYY")
The question is how to remove the rows from my table which have the same string in two cells (same row).
I think that mtcars
can be used as an example:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
gear
and carb
columns can be used. As you can see two first rows should be removed from this data because both have the same value 4
in those two columns. Please take to the account that in my data I don't have numeric values but character string.
For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).
Go ahead to right click selected cells and select the Delete from the right-clicking menu. And then check the Entire row option in the popping up Delete dialog box, and click the OK button. Now you will see all the cells containing the certain value are removed.
To remove rows of data from a dataframe based on multiple conditional statements. We use square brackets [ ] with the dataframe and put multiple conditional statements along with AND or OR operator inside it. This slices the dataframe and removes all the rows that do not satisfy the given conditions.
Based on the information in the post, I think a comparison (!=
) between the 'gear' and 'carb' columns will be enough to subset
the dataset
df1 <- mtcars[1:5,]
subset(df1, gear!=carb)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
This should also work for 'non-numeric' columns, but not for partial matches.
If we need to make an exception about keeping the rows that have both 'Unknown', we can use the |
operator after adding another logical condition (`(gear=='Unknown' & carb=='Unknown')) to the original condition.
Making some changes in the dataset to show the output (just as an example, I know I am changing a numeric column to character by doing this)
df1$gear[4] <- 'Unknown'
df1$carb[4] <- 'Unknown'
df1$gear[5] <- 'Unknown'
subset(df1, (gear=='Unknown' & carb=='Unknown') | gear!=carb)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 Unknown Unknown
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 Unknown 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With