I have a dataframe df
that looks like:
one three two
0 1.0 10.0 4.0
1 2.0 3.0 3.0
2 3.0 22.0 2.0
3 4.0 1.0 1.0
I have another single row dataframe df2
that looks like:
a b m u
0 1.0 2.0 1.0 4.0
I want to concatenate the two to end up with:
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0
I tried:
df3 = pd.concat([df, df2], axis=1, ignore_index=True)
0 1 2 3 4 5 6
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 NaN NaN NaN NaN
2 3.0 22.0 2.0 NaN NaN NaN NaN
3 4.0 1.0 1.0 NaN NaN NaN NaN
Err Wrong answer...
How can I sort this out?
Many thanks.
Use pandas. concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another.
The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
You can use pd. concat([df1, df2, df3, df4], axis=1) to concat vertically.
Use merge
with assigning a dummy key.
df.assign(key=1).merge(df2.assign(key=1), on='key').drop('key',axis=1)
Output:
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0
I think you can use numpy.tile
for repeat data:
df2 = pd.DataFrame(np.tile(df2.values, len(df.index)).reshape(-1,len(df2.columns)),
columns=df2.columns)
print (df2)
a b m u
0 1.0 2.0 1.0 4.0
1 1.0 2.0 1.0 4.0
2 1.0 2.0 1.0 4.0
3 1.0 2.0 1.0 4.0
df3 = df.join(df2)
print (df3)
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0
Or improved John Galt solution - only replaced NaN
s of columns from df2
:
df3 = df.join(df2)
df3[df2.columns] = df3[df2.columns].ffill()
print (df3)
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0
Another solution with assign
by Series
created by iloc
, but columns names has to be strings:
df3 = df.assign(**df2.iloc[0])
print (df3)
one three two a b m u
0 1.0 10.0 4.0 1.0 2.0 1.0 4.0
1 2.0 3.0 3.0 1.0 2.0 1.0 4.0
2 3.0 22.0 2.0 1.0 2.0 1.0 4.0
3 4.0 1.0 1.0 1.0 2.0 1.0 4.0
Timings:
np.random.seed(44)
N = 1000000
df = pd.DataFrame(np.random.random((N,5)), columns=list('ABCDE'))
df2 = pd.DataFrame(np.random.random((1, 50)))
df2.columns = 'a' + df2.columns.astype(str)
In [369]: %timeit df.join(pd.DataFrame(np.tile(df2.values, len(df.index)).reshape(-1,len(df2.columns)), columns=df2.columns))
1 loop, best of 3: 897 ms per loop
In [370]: %timeit df.assign(**df2.iloc[0])
1 loop, best of 3: 467 ms per loop
In [371]: %timeit df.assign(key=1).merge(df2.assign(key=1), on='key').drop('key',axis=1)
1 loop, best of 3: 1.55 s per loop
In [372]: %%timeit
...: df3 = df.join(df2)
...: df3[df2.columns] = df3[df2.columns].ffill()
...:
1 loop, best of 3: 1.9 s per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With