Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't drop column - select() with dplyr

Tags:

r

dplyr

I'm using dplyr and I have a grouped data.frame. I tried to drop a column with the selectfunction in this grouped_df, but got the error message

> tbl %>% select(-names)
Error: corrupt 'grouped_df', contains 42 rows, and 965 rows in groups

My data is below.

> print(tbl_df(tbl), n = 1000)
Source: local data frame [42 x 15]

                     household                                       names x2003 x2004 x2005 x2006 x2007  x2008  x2009  x2012 last.avail last.avail.year absChange.last annChange.last           translation
                         (chr)                                      (fctr) (int) (int) (int) (int) (int)  (int)  (int)  (int)      (int)           (dbl)          (int)          (dbl)                (fctr)
1               all households                                      bostad 59280 61850 62760 63210 66950  73340  72350  77750      77750            2012          18470    0.030594980          Accomodation
2               all households                           fritid och kultur 45140 46140 49260 48640 49720  55120  53970  61170      61170            2012          16030    0.034341864   Leisure and culture
3               all households                                   transport 41930 40430 45870 48850 47280  50250  42650  49940      49940            2012           8010    0.019614408        Transportation
4               all households                             köpta livsmedel 28420 30000 29130 30420 30750  34130  34780  34570      34570            2012           6150    0.022004509      Bought Groceries
5               all households hyra/avgift för hyres-/borätt (inkl garage) 27310 27720 28860 30000 28990  29660  30740     NA      30740            2009           3430    0.019914330 Rent for accomodation
6               all households                            hushållstjänster 11360 12030 13200 12390  8520  10250  13530  22900      22900            2012          11540    0.081007165    Household services
7           cohabit with child                                      bostad 78240 83040 81390 79180 90490  95630 100060 100980     100980            2012          22740    0.028754709          Accomodation
8           cohabit with child                           fritid och kultur 67110 67640 67290 64600 74290  71890  77200  81180      81180            2012          14070    0.021373640   Leisure and culture
9           cohabit with child                                   transport 58350 62440 70010 69560 68730  75290  65510  71340      71340            2012          12990    0.022584342        Transportation
10          cohabit with child                             köpta livsmedel 45190 45660 45720 44980 48250  52880  52770  52710      52710            2012           7520    0.017250361      Bought Groceries
11          cohabit with child                            hushållstjänster 19840 21380 25690 21430 17190  19060  24730  37440      37440            2012          17600    0.073108900    Household services
12          cohabit with child                             räntor (brutto) 27090 25230 24390 24500 28510  36030  33080     NA      33080            2009           5990    0.033854485           Rents (net)
13       cohabit without child                                      bostad 60340 63230 63560 61760 67100  74160  70440  78510      78510            2012          18170    0.029679783          Accomodation
14       cohabit without child                           fritid och kultur 51120 48780 57700 57320 57620  67220  62460  68400      68400            2012          17280    0.032884345   Leisure and culture
15       cohabit without child                                   transport 49740 46310 55580 57730 56770  54910  52720  59360      59360            2012           9620    0.019839931        Transportation
16       cohabit without child                             köpta livsmedel 31130 33700 31900 33000 33990  37330  37980  37090      37090            2012           5960    0.019654591      Bought Groceries
17       cohabit without child                                drift av bil 24370 21790 25170 27530 25140  28180  26650     NA      26650            2009           2280    0.015017696          Car expenses
18       cohabit without child                            hushållstjänster 11650 12400 12260 12310  8580  11920  13950  26370      26370            2012          14720    0.095016005    Household services
19    other cohabit with child                           fritid och kultur 67680 75550 78020 75800 88870  80070  84490 116020     116020            2012          48340    0.061715253   Leisure and culture
20    other cohabit with child                                      bostad 73850 68740 84800 86510 89290 106540  89650 100580     100580            2012          26730    0.034920030          Accomodation
21    other cohabit with child                                   transport 66950 79620 75730 77800 81010  93790  77960  98660      98660            2012          31710    0.044022982        Transportation
22    other cohabit with child                             köpta livsmedel 54070 53790 50680 51440 53720  64170  62050  63690      63690            2012           9620    0.018360752      Bought Groceries
23    other cohabit with child                                drift av bil 32690 34180 37530 36200 38280  38990  36390     NA      36390            2009           3700    0.018031437          Car expenses
24    other cohabit with child                            hushållstjänster 15690 21000 20810 20370  9990  11880  19710  32460      32460            2012          16770    0.084128145    Household services
25            other households                                      bostad 62860 68680 69950 72840 70700  91510  84480  86020      86020            2012          23160    0.035466655          Accomodation
26            other households                           fritid och kultur 49940 48530 55280 57970 54470  61130  65280  67920      67920            2012          17980    0.034758001   Leisure and culture
27            other households                                   transport 50590 41980 57370 64960 52780  61460  59770  59630      59630            2012           9040    0.018435074        Transportation
28            other households                             köpta livsmedel 35370 35210 35360 41560 35040  43770  45940  43270      43270            2012           7900    0.022652258      Bought Groceries
29            other households                                drift av bil 21440 21580 25640 30070 28260  30070  32010     NA      32010            2009          10570    0.069079862          Car expenses
30            other households hyra/avgift för hyres-/borätt (inkl garage) 29550 32320 25170 24600 29480  35290  25920     NA      25920            2009          -3630   -0.021607942 Rent for accomodation
31               single parent                                      bostad 67890 67250 71200 75210 71000  73490  74710  81820      81820            2012          13930    0.020953501          Accomodation
32               single parent                           fritid och kultur 34900 35860 43600 46770 43540  46160  45840  51000      51000            2012          16100    0.043049627   Leisure and culture
33               single parent hyra/avgift för hyres-/borätt (inkl garage) 43360 44020 45160 49430 45370  44090  48740     NA      48740            2009           5380    0.019685026 Rent for accomodation
34               single parent                                   transport 27230 30810 28810 28410 30500  30390  29360  34890      34890            2012           7660    0.027925124        Transportation
35               single parent                             köpta livsmedel 26420 27910 28160 29100 28310  33020  35910  33740      33740            2012           7320    0.027546212      Bought Groceries
36               single parent                            hushållstjänster  9490 11690 13770  8650  7250  10390  11490  17140      17140            2012           7650    0.067891620    Household services
37 single parent without child                                      bostad 45660 47110 48750 50850 51610  55720  56020  61090      61090            2012          15430    0.032876143          Accomodation
38 single parent without child                           fritid och kultur 28270 31890 31140 30210 28480  35650  32840  41770      41770            2012          13500    0.044329701   Leisure and culture
39 single parent without child hyra/avgift för hyres-/borätt (inkl garage) 31900 32160 33010 36300 34300  35330  37800     NA      37800            2009           5900    0.028687635 Rent for accomodation
40 single parent without child                                   transport 26730 22980 24530 29310 28440  31680  20150  28800      28800            2012           2070    0.008322088        Transportation
41 single parent without child                             köpta livsmedel 15330 16930 16150 17630 17280  18390  19370  19580      19580            2012           4250    0.027561531      Bought Groceries
42 single parent without child                            hushållstjänster  6570  6590  6840  7080  3780   4300   7000  12310      12310            2012           5740    0.072257733    Household services

What is the issue and how can this be resolved?

like image 860
uncool Avatar asked Nov 01 '15 13:11

uncool


People also ask

How do I drop a column using dplyr?

In order to drop the column which ends with certain label we will be using select() function along with ends_with() function by passing the column label inside the ends_with() function as shown below. Dropping the column name which ends with “cyl” is accomplished using ends_with() function and select() function.

How do I remove a column from a selection in R?

Deleting a column using dplyr is very easy using the select() function and the - sign. For example, if you want to remove the columns “X” and “Y” you'd do like this: select(Your_Dataframe, -c(X, Y)) .

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

How do I drop multiple columns in dplyr?

Use dplyr to Drop Multiple Columns Using a Function in R As usual, to drop columns, we use the ! operator. In the example, we use a simple custom function to select all columns with more than 10. The code drops these and returns the remaining columns.


1 Answers

If the variable to drop is used as a grouping variable, we need to ungroup before using that variable in the select. In the current dplyr version (dplyr_0.4.3) this is the case, but it may or may not change in the future dplyr versions

tbl %>% 
    ungroup() %>%
    select(-names)

As an example of corrupted grouped data, suppose if we try to remove column 'y' from 'df3'

dat3 %>% 
  select(-y)
#Error: corrupt 'grouped_df', contains 1100 rows, and 1000 rows in groups

By checking the str(dat3)

str(dat3)
#Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1100 obs. of  2 variables:
# $ group: Factor w/ 3 levels "A","B","C": 2 3 2 2 2 2 1 2 2 1 ...
# $ y    : num  1.396 -0.892 1.065 0.801 -0.368 ...
# - attr(*, "vars")=List of 1
#  ..$ : symbol group
# - attr(*, "drop")= logi TRUE
# - attr(*, "indices")=List of 3
#  ..$ : int  6 9 12 13 14 16 18 21 25 27 ...
#  ..$ : int  0 2 3 4 5 7 8 10 11 15 ...
#  ..$ : int  1 17 24 28 35 37 39 43 47 49 ...
# - attr(*, "group_sizes")= int  323 365 312
# - attr(*, "biggest_group_size")= int 365
# - attr(*, "labels")='data.frame':      3 obs. of  1 variable:
#  ..$ group: Factor w/ 3 levels "A","B","C": 1 2 3
#  ..- attr(*, "vars")=List of 1
#  .. ..$ : symbol group
#  ..- attr(*, "drop")= logi TRUE

we find that attr are added by rbinding, but instead if we use bind_rows

dat4 <- bind_rows(dat1, dat2)
str(dat4)
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame':       1100 obs. of  2 variables:
# $ group: chr  "B" "C" "B" "B" ...
# $ y    : num  1.396 -0.892 1.065 0.801 -0.368 ...

We can remove the 'y' column from 'dat4'

 dat4 %>%
    select(-y)

As the OP didn't show how the 'tbl' got created, we can only assume that it was created using some methods which corrupted by the dataset by adding attributes.

like image 108
akrun Avatar answered Sep 30 '22 21:09

akrun