Possible Duplicate:
removing specific rows from a dataframe
Let's say I have a data frame consisting of a number of rows, like this:
X <- data.frame(Variable1=c(11,14,12,15),Variable2=c(2,3,1,4))
Variable1 Variable2
11 2
14 3
12 1
15 4
Now, let's say that I want to create a new data frame that is a duplicate of this one, only that I'm removing all rows in which Variable1 has a certain numerical value. Let's say we have these numbers stored in a vector, v.
That is, if v contains the numbers 11 and 12, the new data frame should look like this:
Variable1 Variable2
14 3
15 4
I've been searching the net for quite some time now trying to figure out how to do something like this. Mainly, I would just need some kind of command saying removeRow(dataframe, row)
or something like that.
To remove duplicates of only one or a subset of columns, specify subset as the individual column or list of columns that should be unique. To do this conditional on a different column's value, you can sort_values(colname) and specify keep equals either first or last .
subset: column label or sequence of labels to consider for identifying duplicate rows. By default, all the columns are used to find the duplicate rows. keep: allowed values are {'first', 'last', False}, default 'first'. If 'first', duplicate rows except the first one is deleted.
To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.
X <- data.frame(Variable1=c(11,14,12,15),Variable2=c(2,3,1,4))
> X
Variable1 Variable2
1 11 2
2 14 3
3 12 1
4 15 4
> X[X$Variable1!=11 & X$Variable1!=12, ]
Variable1 Variable2
2 14 3
4 15 4
> X[ ! X$Variable1 %in% c(11,12), ]
Variable1 Variable2
2 14 3
4 15 4
You can functionalize this however you like.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With