Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge a copy of one pandas DataFrame into every row of another DataFrame?

I have a scenario where I want to expand a dataframe by merging in another, smaller table into every row of the dataframe.

In other words, if the larger table is 10 rows, and the small table is 2 rows, then the result would be a table of length 20 where every row in the original table is duplicated and the new columns from the smaller table are merged.

To accomplish this, I have written a small function that adds a common column onto each table, merges on that column, then drops the column.

def merge_expand(big, small):
    placeholder = "__placeholderstring__"
    big.insert(0, placeholder, 1)
    small.insert(0, placeholder, 1)
    merged = big.merge(small, how='left', on=placeholder)
    merged.drop(columns=placeholder, inplace=True)
    return merged

# example
big = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
small = pd.DataFrame({'id': ['aa','bb'], 'val':['a','b']})
merge_expand(big, small)

# output:
   a  b  id val
0  1  4  aa   a
1  1  4  bb   b
2  2  5  aa   a
3  2  5  bb   b
4  3  6  aa   a
5  3  6  bb   b

This does the job, but seems to me that it is hacky and may not be the most efficient solution, as it needs to perform multiple DataFrame operations. What would be the most efficient way to handle this problem?

like image 739
teepee Avatar asked Aug 24 '19 00:08

teepee


1 Answers

It looks like you are looking for a full join / cartesian join. It can be accomplished with pd.merge if we assign the same key to all observations.

big.assign(key=1).merge(small.assign(key=1), how='outer', on='key')

Output

   a  b  key  id val
0  1  4    1  aa   a
1  1  4    1  bb   b
2  2  5    1  aa   a
3  2  5    1  bb   b
4  3  6    1  aa   a
5  3  6    1  bb   b

If you already have a columns called 'key', you can essentially call it anything:

big['thiswontmatchanything'] = 1
small['thiswontmatchanything'] = 1

big.merge(small, how='outer', on='thiswontmatchanything').drop('thiswontmatchanything', axis=1)

Output

    a   b   id  val
0   1   4   aa  a
1   1   4   bb  b
2   2   5   aa  a
3   2   5   bb  b
4   3   6   aa  a
5   3   6   bb  b
like image 164
calestini Avatar answered Sep 25 '22 08:09

calestini