I have a data frame input that looks like:
col1 col2 col3 0 3 1 NaN 1 NaN 7 8
How to collapse all rows while concatenating the data in the rows with ', '
?
The desired data frame output:
col1 col2 col3 0 3 1, 7 8
Sample input code:
import pandas as pd
import numpy as np
d = {'col1': ["3", np.nan], 'col2': ["1", "7"], 'col3': [np.nan, "8"]}
df = pd.DataFrame(data=d)
Use pandas.concat() method to concat two DataFrames by rows meaning appending two DataFrames. By default, it performs append operations similar to a union where it bright all rows from both DataFrames to a single DataFrame.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name.
Step #1: Load numpy and Pandas. Step #2: Create random data and use them to create a pandas dataframe. Step #3: Convert multiple lists into a single data frame, by creating a dictionary for each list with a name. Step #4: Then use Pandas dataframe into dict.
Example 2: Similarly, we can concatenate any number of columns in a dataframe. Let’s see through another example to concatenate three different columns of the day, month, and year in a single column Date.
Group the data using Dataframe.groupby () method whose attributes you need to concatenate. Concatenate the string by using the join function and transform the value of that column using lambda statement. We will use the CSV file having 2 columns, the content of the file is shown in the below image:
You have now learned the three most important techniques for combining data in Pandas: merge () for combining data on common columns or indices. .join () for combining data on a key column or an index. concat () for combining DataFrames across rows or columns.
With concatenation, your datasets are just stitched together along an axis — either the row axis or column axis. Visually, a concatenation with no parameters along rows would look like this:
agg
+ dropna
+ str.join
comes to mind.
df.agg(lambda x: ', '.join(x.dropna())).to_frame().T
col1 col2 col3
0 3 1, 7 8
There are other solutions, my peers will find them for you :)
pd.DataFrame(
[[
', '.join(map(str, map(int, filter(pd.notna, c))))
for c in zip(*df.values)
]], columns=df.columns
)
col1 col2 col3
0 3 1, 7 8
One way to get what you want would be to create a new dataframe with the same columns as your old dataframe, and populate the first index with your desired data. In your case, your desired data would be a list of each column, joined by ', '
, and with your NaN
values removed:
new_df = pd.DataFrame(columns=df.columns)
for col in df.columns:
new_df.loc[0, col] = ', '.join(df[col].dropna().tolist())
>>> new_df
col1 col2 col3
0 3 1, 7 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With