Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows in dataframe based on three columns

I have a dataframe like z:

z <- matrix(c(1,0,0,1,1,0,0, 
      1,0,0,0,1,0,0, 
      0,0,0,0,0,0,0, 
      0,0,1,0,0,0,0), 
    nrow=7, 
    dimnames=list(LETTERS[1:7],NULL)) 

   [,1] [,2] [,3] [,4]
A    1    1    0    0
B    0    0    0    0
C    0    0    0    1
D    1    0    0    0
E    1    1    0    0
F    0    0    0    0
G    0    0    0    0

Now I want to remove the duplicated rows where the values of column 1, 2, and 3 are the same.

  • Remove row E because it is identical to A.
  • Remove row C, F and G because they are identical to B.

The result should be like this:

   [,1] [,2] [,3] [,4]
A    1    1    0    0
B    0    0    0    0
D    1    0    0    0

Could anyone help me with this? Many thanks!

like image 582
Lisann Avatar asked Nov 01 '11 13:11

Lisann


People also ask

How do you delete a row from a DataFrame based on multiple column values?

Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.

How do you delete a row in a DataFrame based on multiple conditions?

The Pandas dataframe drop() method takes single or list label names and delete corresponding rows and columns. The axis = 0 is for rows and axis =1 is for columns. In this example, we are deleting the row that 'mark' column has value =100 so three rows are satisfying the condition.


1 Answers

> z[rownames(unique(z[,-4])),]
  [,1] [,2] [,3] [,4]
  A    1    1    0    0
  B    0    0    0    0
  D    1    0    0    0
like image 122
Max Avatar answered Oct 06 '22 00:10

Max