I have the following data frame:
> str(df) 'data.frame': 3149 obs. of 9 variables: $ mkod : int 5029 5035 5036 5042 5048 5050 5065 5071 5072 5075 ... $ mad : Factor w/ 65 levels "Akgün Kasetçilik ",..: 58 29 59 40 56 11 33 34 19 20 ... $ yad : Factor w/ 44 levels "BAKUGAN","BARBIE",..: 1 1 1 1 1 1 1 1 1 1 ... $ donem: int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ... $ sayi : int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ... $ plan : int 2 2 3 2 2 2 7 3 2 7 ... $ sevk : int 2 2 3 2 2 2 6 3 2 7 ... $ iade : int 0 0 3 1 2 2 6 2 2 3 ... $ satis: int 2 2 0 1 0 0 0 1 0 4 ...
I want to remove 21 specific rows from this data frame.
> a <- df[df$plan==0 & df$sevk==0,] > nrow(a) [1] 21
So when I remove those 21 rows, I will have a new data frame with 3149 - 21 = 3128 rows. I found the following solution:
> b <- df[df$plan!=0 | df$sevk!=0,] > nrow(b) [1] 3128
My above solution uses a modified logical expression (!=
instead of ==
and |
instead of &
). Other than modifying the original logical expression, how can I obtain the new data frame without those 21 rows? I need something like that:
> df[-a,] #does not work
EDIT (especially for the downvoters, I hope they understand why I need an alternative solution): I asked for a different solution because I'm writing a long code, and there are various variable assignments (like a
's in my example) in various parts of my code. So, when I need to remove rows in advancing parts of my code, I don't want to go back and try to write the inverse of the logical expressions inside a
-like expressions. That's why df[-a,]
is more usable for me.
To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.
To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).
Just negate your logical subscript:
a <- df[!(df$plan==0 & df$sevk==0),]
You can use the rownames
to specify a "complementary" dataframe. Its easier if they are numerical rownames:
df[-as.numeric(rownames(a)),]
But more generally you can use:
df[setdiff(rownames(df),rownames(a)),]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With