I have an existing dataframe and I'm trying to concatenate a dictionary where the length of the dictionary is different from the dataframe
>>> df
A B C
0 0.46324 0.32425 0.42194
1 0.10596 0.35910 0.21004
2 0.69209 0.12951 0.50186
3 0.04901 0.31203 0.11035
4 0.43104 0.62413 0.20567
5 0.43412 0.13720 0.11052
6 0.14512 0.10532 0.05310
and
test = {"One": [0.23413, 0.19235, 0.51221], "Two": [0.01293, 0.12235, 0.63291]}
I'm trying to add test
to df
, while changing the keys to "D"
and "C"
and I've had a look at
http://pandas.pydata.org/pandas-docs/stable/merging.html and http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html
which indicates that I should be able to concatenate the dictionary to the dataframe
I've tried:
pd.concat([df, test], axis=1, ignore_index=True, keys=["D", "E"])
pd.concat([df, test], axis=1, ignore_index=True)
but I'm not having any luck, the result I'm trying to achieve is
df
A B C D E
0 0.46324 0.32425 0.42194 0.23413 0.01293
1 0.10596 0.35910 0.21004 0.19235 0.12235
2 0.69209 0.12951 0.50186 0.51221 0.63291
3 0.04901 0.31203 0.11035 NaN NaN
4 0.43104 0.62413 0.20567 NaN NaN
5 0.43412 0.13720 0.11052 NaN NaN
6 0.14512 0.10532 0.05310 NaN NaN
We'll pass two dataframes to pd. concat() method in the form of a list and mention in which axis you want to concat, i.e. axis=0 to concat along rows, axis=1 to concat along columns.
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
By using pandas. concat() you can combine pandas objects for example multiple series along a particular axis (column-wise or row-wise) to create a DataFrame. concat() method takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows.
The only way you can do that is with:
df.join(pd.DataFrame(test).rename(columns={'One':'D','Two':'E'}))
A B C D E
0 0.46324 0.32425 0.42194 0.23413 0.01293
1 0.10596 0.35910 0.21004 0.19235 0.12235
2 0.69209 0.12951 0.50186 0.51221 0.63291
3 0.04901 0.31203 0.11035 NaN NaN
4 0.43104 0.62413 0.20567 NaN NaN
5 0.43412 0.13720 0.11052 NaN NaN
6 0.14512 0.10532 0.05310 NaN NaN
because as @Alexander mentioned correctly the number of rows being concatenated should match. Otherwise, as in your case, missing rows will be filled with NaN
Assuming you want to add them as rows:
>>> pd.concat([df, pd.DataFrame(test.values(), columns=df.columns)], ignore_index=True)
A B C
0 0.46324 0.32425 0.42194
1 0.10596 0.35910 0.21004
2 0.69209 0.12951 0.50186
3 0.04901 0.31203 0.11035
4 0.43104 0.62413 0.20567
5 0.43412 0.13720 0.11052
6 0.14512 0.10532 0.05310
7 0.01293 0.12235 0.63291
8 0.23413 0.19235 0.51221
If added as new columns:
df_new = pd.concat([df, pd.DataFrame(test.values()).T], ignore_index=True, axis=1)
df_new.columns = \
df.columns.tolist() + [{'One': 'D', 'Two': 'E'}.get(k) for k in test.keys()]
>>> df_new
A B C E D
0 0.46324 0.32425 0.42194 0.01293 0.23413
1 0.10596 0.35910 0.21004 0.12235 0.19235
2 0.69209 0.12951 0.50186 0.63291 0.51221
3 0.04901 0.31203 0.11035 NaN NaN
4 0.43104 0.62413 0.20567 NaN NaN
5 0.43412 0.13720 0.11052 NaN NaN
6 0.14512 0.10532 0.05310 NaN NaN
Order is not guaranteed in dictionaries (e.g. test
), so the new column names actually need to be mapped to the keys.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With