Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete rows for repeated data (R)

I've done a quick search on this topic but haven't found anything from previous posts to address my question. It seems very straight forward but I've still not figured out how to do this efficiently.

In the data frame below, I'd like to delete all rows with a single entry (In this case B500 and D40).

x_1 <- c("A1", "A1","A1", "B10", "B10", "B10","B10", 
            "B500", "C100", "C100", "C100", "D40", "G100", "G100")
   z_1 <- rnorm(14, 70) 
   z_2 <- rnorm(14, 1.7)
   A <- data.frame(x_1, z_1, z_2)

        x_1      z_1       z_2
1        A1 69.65033 1.5308858
2        A1 68.72687 2.2859416
3        A1 68.32700 0.7994794
4       B10 68.68382 0.5212132
5       B10 70.23359 1.3266729
6       B10 70.68604 4.3823605
7       B10 70.52774 2.2430322
8       B500 69.62868 3.0121398
9       C100 69.41412 2.1895905
10      C100 69.10745 1.7599065
11      C100 69.70876 1.6001099
12      D40 68.96542 0.7485665
13      G100 70.21754 1.9635395
14      G100 72.70583 3.0645247

I can do this manually by using:

A[!A$x_1 %in% c("B500", "D40"), ]

Another way of doing this is using the table function below:

 table(A$x_1)

   A1  B10 B500 C100  D40 G100 
   3    4    1    3    1    2 

Now, my problem is how do I select the entries with just the number 1 underneath them? If I can do this, I should be able to get the names and then delete them from the data frame.

Any useful ideas/codes would be highly appreciated.

like image 345
John_dydx Avatar asked Feb 03 '14 15:02

John_dydx


2 Answers

You can use duplicated twice:

A[duplicated(A$x_1) | duplicated(A$x_1, fromLast = TRUE), ]

    x_1      z_1       z_2
1    A1 70.32176 2.5074802
2    A1 70.28238 1.8819723
3    A1 67.93057 2.1899037
4   B10 69.75905 1.8493991
5   B10 70.25713 2.6948229
6   B10 69.33121 0.2793853
7   B10 70.82879 2.2831781
9  C100 70.14587 1.0332913
10 C100 69.51571 0.2590098
11 C100 70.48928 1.8471024
13 G100 72.11057 0.6914086
14 G100 69.93814 2.4245214

For more information on how this works, see this answer.

like image 70
Sven Hohenstein Avatar answered Sep 23 '22 18:09

Sven Hohenstein


Continuing on your table path. I assign your table to an object. The names of the desired table entries are then extracted and used to subset the data frame.

tt <- table(A$x_1)
A[!A$x_1 %in% names(tt[tt == 1]), ]

# or
A[A$x_1 %in% names(tt[tt > 1]), ]

#     x_1      z_1       z_2
# 1    A1 69.18667 0.8578626
# 2    A1 71.36819 2.8482506
# 3    A1 69.71246 1.9528315
# 4   B10 69.47145 1.7852872
# 5   B10 69.12699 0.7663739
# 6   B10 70.93589 1.1431804
# 7   B10 68.72273 0.6836297
# 9  C100 70.31252 2.4651336
# 10 C100 69.89168 1.9991948
# 11 C100 70.25079 1.0823843
# 13 G100 69.56992 2.0879085
# 14 G100 68.29589 2.5432109
like image 44
Henrik Avatar answered Sep 22 '22 18:09

Henrik