Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a pandas equivalent to the tidyr nest function?

The tidyr::unnest method from the R language as an equivalent in pandas and it is called explode as explained in this very detailed answer. I would like to know if there is an equivalent to the ̀tidyr::nest` method.

Example R code:

library(tidyr)
iris_nested <- as_tibble(iris) %>% nest(data=-Species)

The data column is a list-column, which contains data frames (this is useful for modelling for example, when running many models).

iris_nested
# A tibble: 3 x 2
  Species              data
  <fct>      <list<df[,4]>>
1 setosa           [50 × 4]
2 versicolor       [50 × 4]
3 virginica        [50 × 4]

To access one element inside the data column:

iris_nested[1,'data'][[1]]
[...]
# A tibble: 50 x 4
   Sepal.Length Sepal.Width Petal.Length Petal.Width
          <dbl>       <dbl>        <dbl>       <dbl>
 1          5.1         3.5          1.4         0.2
 2          4.9         3            1.4         0.2
 3          4.7         3.2          1.3         0.2
 4          4.6         3.1          1.5         0.2
 5          5           3.6          1.4         0.2
 6          5.4         3.9          1.7         0.4
 7          4.6         3.4          1.4         0.3
 8          5           3.4          1.5         0.2
 9          4.4         2.9          1.4         0.2
10          4.9         3.1          1.5         0.1
# … with 40 more rows
library(tidyr)
iris_nested <- as_tibble(iris) %>% nest(data=-Species)
iris_nested
iris_nested[1,'data'][[1]]

Example python code:

import seaborn
iris = seaborn.load_dataset("iris")

How can I nest this data frame in pandas :

  1. firstly in a less complex way (on paar with the pandas explode functionality) the data column contains a simple list
  2. secondly the data column contains data frames as illustrated in the example above
like image 927
Paul Rougieux Avatar asked Nov 27 '19 10:11

Paul Rougieux


People also ask

What is nest() in R?

Source: R/nest.R. nest.Rd. Nesting creates a list-column of data frames; unnesting flattens it back out into regular columns. Nesting is implicitly a summarising operation: you get one row for each group defined by the non-nested columns.

How do I nest a variable in R?

df %>% nest(x, y) specifies the columns to be nested; i.e. the columns that will appear in the inner data frame. Alternatively, you can nest() a grouped data frame created by dplyr::group_by() . The grouping variables remain in the outer data frame and the others are nested.


2 Answers

I think this is the closest:

df=iris.groupby("Species").apply(lambda x:dict(x))

Output:

Species
setosa        {'Sepal.Length': [5.1, 4.9, 4.7, 4.6, 5.0, 5.4...
versicolor    {'Sepal.Length': [7.0, 6.4, 6.9, 5.5, 6.5, 5.7...
virginica     {'Sepal.Length': [6.3, 5.8, 7.1, 6.3, 6.5, 7.6...

To access one of the Species:

pd.DataFrame(df['setosa'])


     Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
100           5.1          3.5           1.4          0.2  setosa
101           4.9          3.0           1.4          0.2  setosa
102           4.7          3.2           1.3          0.2  setosa
103           4.6          3.1           1.5          0.2  setosa
104           5.0          3.6           1.4          0.2  setosa
105           5.4          3.9           1.7          0.4  setosa
106           4.6          3.4           1.4          0.3  setosa
107           5.0          3.4           1.5          0.2  setosa
108           4.4          2.9           1.4          0.2  setosa
109           4.9          3.1           1.5          0.1  setosa
110           5.4          3.7           1.5          0.2  setosa
111           4.8          3.4           1.6          0.2  setosa
112           4.8          3.0           1.4          0.1  setosa
113           4.3          3.0           1.1          0.1  setosa
114           5.8          4.0           1.2          0.2  setosa
115           5.7          4.4           1.5          0.4  setosa
116           5.4          3.9           1.3          0.4  setosa
117           5.1          3.5           1.4          0.3  setosa
118           5.7          3.8           1.7          0.3  setosa
119           5.1          3.8           1.5          0.3  setosa
120           5.4          3.4           1.7          0.2  setosa
121           5.1          3.7           1.5          0.4  setosa
122           4.6          3.6           1.0          0.2  setosa
123           5.1          3.3           1.7          0.5  setosa
124           4.8          3.4           1.9          0.2  setosa
like image 67
Billy Bonaros Avatar answered Oct 10 '22 11:10

Billy Bonaros


It's easy to do it using datar:

>>> from datar.all import f, nest
>>> from datar.datasets import iris
>>> iris_nested = iris >> nest(data=~f.Species)
>>> iris_nested
      Species       data
     <object>   <object>
0      setosa  <DF 50x4>
1  versicolor  <DF 50x4>
2   virginica  <DF 50x4>
>>> iris_nested.iloc[0, 1]
    Sepal_Length  Sepal_Width  Petal_Length  Petal_Width
       <float64>    <float64>     <float64>    <float64>
0            5.1          3.5           1.4          0.2
1            4.9          3.0           1.4          0.2
2            4.7          3.2           1.3          0.2
3            4.6          3.1           1.5          0.2
4            5.0          3.6           1.4          0.2
5            5.4          3.9           1.7          0.4
6            4.6          3.4           1.4          0.3
7            5.0          3.4           1.5          0.2
8            4.4          2.9           1.4          0.2
9            4.9          3.1           1.5          0.1
10           5.4          3.7           1.5          0.2
11           4.8          3.4           1.6          0.2
12           4.8          3.0           1.4          0.1
13           4.3          3.0           1.1          0.1
14           5.8          4.0           1.2          0.2
15           5.7          4.4           1.5          0.4
16           5.4          3.9           1.3          0.4
17           5.1          3.5           1.4          0.3
18           5.7          3.8           1.7          0.3
19           5.1          3.8           1.5          0.3
20           5.4          3.4           1.7          0.2
21           5.1          3.7           1.5          0.4
22           4.6          3.6           1.0          0.2
23           5.1          3.3           1.7          0.5
24           4.8          3.4           1.9          0.2
25           5.0          3.0           1.6          0.2
26           5.0          3.4           1.6          0.4
27           5.2          3.5           1.5          0.2
28           5.2          3.4           1.4          0.2
29           4.7          3.2           1.6          0.2
30           4.8          3.1           1.6          0.2
31           5.4          3.4           1.5          0.4
32           5.2          4.1           1.5          0.1
33           5.5          4.2           1.4          0.2
34           4.9          3.1           1.5          0.2
35           5.0          3.2           1.2          0.2
36           5.5          3.5           1.3          0.2
37           4.9          3.6           1.4          0.1
38           4.4          3.0           1.3          0.2
39           5.1          3.4           1.5          0.2
40           5.0          3.5           1.3          0.3
41           4.5          2.3           1.3          0.3
42           4.4          3.2           1.3          0.2
43           5.0          3.5           1.6          0.6
44           5.1          3.8           1.9          0.4
45           4.8          3.0           1.4          0.3
46           5.1          3.8           1.6          0.2
47           4.6          3.2           1.4          0.2
48           5.3          3.7           1.5          0.2
49           5.0          3.3           1.4          0.2

It aligns with dplyr/tidyr APIs.

I am the author of the package. Feel free to submit issues if you have any questions.

like image 2
Panwen Wang Avatar answered Oct 10 '22 11:10

Panwen Wang