Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate Two Dataframes Pandas with single Row

I have a dataframe df that looks like:

   one  three  two
0  1.0   10.0  4.0
1  2.0    3.0  3.0
2  3.0   22.0  2.0
3  4.0    1.0  1.0

I have another single row dataframe df2 that looks like:

     a    b    m    u
0  1.0  2.0  1.0  4.0

I want to concatenate the two to end up with:

   one  three  two    a    b    m    u
0  1.0   10.0  4.0  1.0  2.0  1.0  4.0
1  2.0    3.0  3.0  1.0  2.0  1.0  4.0
2  3.0   22.0  2.0  1.0  2.0  1.0  4.0
3  4.0    1.0  1.0  1.0  2.0  1.0  4.0

I tried:

df3 = pd.concat([df, df2], axis=1, ignore_index=True)

     0     1    2    3    4    5    6
0  1.0  10.0  4.0  1.0  2.0  1.0  4.0
1  2.0   3.0  3.0  NaN  NaN  NaN  NaN
2  3.0  22.0  2.0  NaN  NaN  NaN  NaN
3  4.0   1.0  1.0  NaN  NaN  NaN  NaN

Err Wrong answer...

How can I sort this out?

Many thanks.

like image 982
Chuck Avatar asked Aug 16 '17 12:08

Chuck


People also ask

How do I concatenate two DataFrames in a row in Python?

Use pandas. concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another.

How do I merge two DataFrames in pandas with same rows?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

How do you concatenate two DataFrames vertically in pandas?

You can use pd. concat([df1, df2, df3, df4], axis=1) to concat vertically.


Video Answer


2 Answers

Use merge with assigning a dummy key.

df.assign(key=1).merge(df2.assign(key=1), on='key').drop('key',axis=1)

Output:

   one  three  two    a    b    m    u
0  1.0   10.0  4.0  1.0  2.0  1.0  4.0
1  2.0    3.0  3.0  1.0  2.0  1.0  4.0
2  3.0   22.0  2.0  1.0  2.0  1.0  4.0
3  4.0    1.0  1.0  1.0  2.0  1.0  4.0
like image 111
Scott Boston Avatar answered Nov 15 '22 21:11

Scott Boston


I think you can use numpy.tile for repeat data:

df2 = pd.DataFrame(np.tile(df2.values, len(df.index)).reshape(-1,len(df2.columns)), 
                   columns=df2.columns)
print (df2)
     a    b    m    u
0  1.0  2.0  1.0  4.0
1  1.0  2.0  1.0  4.0
2  1.0  2.0  1.0  4.0
3  1.0  2.0  1.0  4.0

df3 = df.join(df2)
print (df3)
   one  three  two    a    b    m    u
0  1.0   10.0  4.0  1.0  2.0  1.0  4.0
1  2.0    3.0  3.0  1.0  2.0  1.0  4.0
2  3.0   22.0  2.0  1.0  2.0  1.0  4.0
3  4.0    1.0  1.0  1.0  2.0  1.0  4.0

Or improved John Galt solution - only replaced NaNs of columns from df2:

df3 = df.join(df2)
df3[df2.columns] = df3[df2.columns].ffill()
print (df3)
   one  three  two    a    b    m    u
0  1.0   10.0  4.0  1.0  2.0  1.0  4.0
1  2.0    3.0  3.0  1.0  2.0  1.0  4.0
2  3.0   22.0  2.0  1.0  2.0  1.0  4.0
3  4.0    1.0  1.0  1.0  2.0  1.0  4.0

Another solution with assign by Series created by iloc, but columns names has to be strings:

df3 = df.assign(**df2.iloc[0])
print (df3)
   one  three  two    a    b    m    u
0  1.0   10.0  4.0  1.0  2.0  1.0  4.0
1  2.0    3.0  3.0  1.0  2.0  1.0  4.0
2  3.0   22.0  2.0  1.0  2.0  1.0  4.0
3  4.0    1.0  1.0  1.0  2.0  1.0  4.0

Timings:

np.random.seed(44)
N = 1000000

df = pd.DataFrame(np.random.random((N,5)), columns=list('ABCDE'))

df2 = pd.DataFrame(np.random.random((1, 50)))
df2.columns = 'a' + df2.columns.astype(str)


In [369]: %timeit df.join(pd.DataFrame(np.tile(df2.values, len(df.index)).reshape(-1,len(df2.columns)), columns=df2.columns))
1 loop, best of 3: 897 ms per loop

In [370]: %timeit df.assign(**df2.iloc[0])
1 loop, best of 3: 467 ms per loop

In [371]: %timeit df.assign(key=1).merge(df2.assign(key=1), on='key').drop('key',axis=1)
1 loop, best of 3: 1.55 s per loop

In [372]: %%timeit
     ...: df3 = df.join(df2)
     ...: df3[df2.columns] = df3[df2.columns].ffill()
     ...: 
1 loop, best of 3: 1.9 s per loop
like image 20
jezrael Avatar answered Nov 15 '22 20:11

jezrael