pandas or python equivalent of tidyr complete

Tags:

I have data that looks like this:

library("tidyverse")

df <- tibble(user = c(1, 1, 2, 3, 3, 3), x = c("a", "b", "a", "a", "c", "d"), y = 1)
df

#    user     x     y
# 1     1     a     1
# 2     1     b     1
# 3     2     a     1
# 4     3     a     1
# 5     3     c     1
# 6     3     d     1

Python format:

import pandas as pd
df = pd.DataFrame({'user':[1, 1, 2, 3, 3, 3], 'x':['a', 'b', 'a', 'a', 'c', 'd'], 'y':1})

I'd like to "complete" the data frame so that every user has a record for every possible x with the default y fill set to 0.

This is somewhat trivial in R (tidyverse/tidyr):

df %>% 
    complete(nesting(user), x = c("a", "b", "c", "d"), fill = list(y = 0))

#    user     x     y
# 1     1     a     1
# 2     1     b     1
# 3     1     c     0
# 4     1     d     0
# 5     2     a     1
# 6     2     b     0
# 7     2     c     0
# 8     2     d     0
# 9     3     a     1
# 10    3     b     0
# 11    3     c     1
# 12    3     d     1

Is there a complete equivalent in pandas / python that will yield the same result?

213

asked May 31 '17 14:05

emehex

1 Answers

You can use reindex by MultiIndex.from_product:

df = df.set_index(['user','x'])
mux = pd.MultiIndex.from_product([df.index.levels[0], df.index.levels[1]],names=['user','x'])
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
    user  x  y
0      1  a  1
1      1  b  1
2      1  c  0
3      1  d  0
4      2  a  1
5      2  b  0
6      2  c  0
7      2  d  0
8      3  a  1
9      3  b  0
10     3  c  1
11     3  d  1

Or set_index + stack + unstack:

df = df.set_index(['user','x'])['y'].unstack(fill_value=0).stack().reset_index(name='y')
print (df)
    user  x  y
0      1  a  1
1      1  b  1
2      1  c  0
3      1  d  0
4      2  a  1
5      2  b  0
6      2  c  0
7      2  d  0
8      3  a  1
9      3  b  0
10     3  c  1
11     3  d  1

answered Oct 10 '22 01:10

jezrael

Related questions
                            
                                For Django Rest Framework, what is the difference in use case for HyperLinkedRelatedField and HyperLinkedIdentityField?
                            
                                How to create multiple workers in Python-RQ?
                            
                                Python-String to Bytes conversion. Double BackSlash issue
                            
                                Why is Anaconda source activate non-existent?
                            
                                How to change default path for "save the figure" in python?
                            
                                Return a download and rendered page in one Flask response
                            
                                Keras learning rate not changing despite decay in SGD
                            
                                ValueError: Attempted relative import in non-package not for tests package
                            
                                python gettext error: Can't convert '__proxy__' object to str implicitly
                            
                                Python, choose logging files' directory
                            
                                How can I get millisecond and microsecond-resolution timestamps in Python?
                            
                                How to refresh text in Matplotlib?
                            
                                Can I use functions imported from .py files in Dask/Distributed?
                            
                                coloring cells in excel with pandas
                            
                                How to store the result from %%timeit cell magic?
                            
                                Keras showing images from data generator
                            
                                randomly remove rows from dataframe based on condition
                            
                                Why does 000 evaluate to 0 in Python 3? [duplicate]
                            
                                What are the causes of overflow encountered in double_scalars besides division by zero?
                            
                                Feature preprocessing of both continuous and categorical variables (of integer type) with scikit-learn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas or python equivalent of tidyr complete

Tags:

python

python-3.x

pandas

emehex

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us