I have a scenario where I want to expand a dataframe by merging in another, smaller table into every row of the dataframe.
In other words, if the larger table is 10 rows, and the small table is 2 rows, then the result would be a table of length 20 where every row in the original table is duplicated and the new columns from the smaller table are merged.
To accomplish this, I have written a small function that adds a common column onto each table, merges on that column, then drops the column.
def merge_expand(big, small):
placeholder = "__placeholderstring__"
big.insert(0, placeholder, 1)
small.insert(0, placeholder, 1)
merged = big.merge(small, how='left', on=placeholder)
merged.drop(columns=placeholder, inplace=True)
return merged
# example
big = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
small = pd.DataFrame({'id': ['aa','bb'], 'val':['a','b']})
merge_expand(big, small)
# output:
a b id val
0 1 4 aa a
1 1 4 bb b
2 2 5 aa a
3 2 5 bb b
4 3 6 aa a
5 3 6 bb b
This does the job, but seems to me that it is hacky and may not be the most efficient solution, as it needs to perform multiple DataFrame operations. What would be the most efficient way to handle this problem?
It looks like you are looking for a full join / cartesian join. It can be accomplished with pd.merge
if we assign the same key
to all observations.
big.assign(key=1).merge(small.assign(key=1), how='outer', on='key')
Output
a b key id val
0 1 4 1 aa a
1 1 4 1 bb b
2 2 5 1 aa a
3 2 5 1 bb b
4 3 6 1 aa a
5 3 6 1 bb b
If you already have a columns called 'key', you can essentially call it anything:
big['thiswontmatchanything'] = 1
small['thiswontmatchanything'] = 1
big.merge(small, how='outer', on='thiswontmatchanything').drop('thiswontmatchanything', axis=1)
Output
a b id val
0 1 4 aa a
1 1 4 bb b
2 2 5 aa a
3 2 5 bb b
4 3 6 aa a
5 3 6 bb b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With