I have following DF
col1 | col2 | col3 | col4 | col5 | col6
0 - | 15.0 | - | - | - | -
1 - | - | - | - | - | US
2 - | - | - | Large | - | -
3 ABC1 | - | - | - | - | -
4 - | - | 24RA | - | - | -
5 - | - | - | - | 345 | -
I want to collapse rows into one as follows
output DF:
col1 | col2 | col3 | col4 | col5 | col6
0 ABC1 | 15.0 | 24RA | Large | 345 | US
I do not want to iterate over columns but want to use pandas to achieve this.
By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .
To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.
Use the pandas. DataFrame. dropna() method to drop the rows with infinite values.
Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.
Following steps are to be followed to collapse multiple columns in Pandas: Step #1: Load numpy and Pandas. Step #2: Create random data and use them to create a pandas dataframe. Step #3: Convert multiple lists into a single data frame, by creating a dictionary for each list with a name. Step #4: Then use Pandas dataframe into dict.
In order to deal with rows, we can perform basic operations on rows like selecting, deleting, adding and renmaing. Row Selection: Pandas provide a unique method to retrieve rows from a Data frame.DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.
In order to display the number of rows and columns that Pandas displays by default, we can use the .get_option () function. This function takes a value and returns the provided option for that value. 'display.max_columns', which controls the number of columns to display
1 Step #1: Load numpy and Pandas. 2 Step #2: Create random data and use them to create a pandas dataframe. 3 Step #3: Convert multiple lists into a single data frame, by creating a dictionary for each list with a name. 4 Step #4: Then use Pandas dataframe into dict. ... 5 Step #5: Specify which columns are to be collapsed. ...
Option 0
Super Simple
pd.concat([pd.Series(df[c].dropna().values, name=c) for c in df], axis=1)
col1 col2 col3 col4 col5 col6
0 ABC1 15.0 24RA Large 345.0 US
Can we handle more than one value per column?
Sure we can!
df.loc[2, 'col3'] = 'Test'
col1 col2 col3 col4 col5 col6
0 ABC1 15.0 Test Large 345.0 US
1 NaN NaN 24RA NaN NaN NaN
Option 1
Generalized solution using np.where
like a surgeon
v = df.values
i, j = np.where(np.isnan(v))
s = pd.Series(v[i, j], df.columns[j])
c = s.groupby(level=0).cumcount()
s.index = [c, s.index]
s.unstack(fill_value='-') # <-- don't fill to get NaN
col1 col2 col3 col4 col5 col6
0 ABC1 15.0 24RA Large 345 US
df.loc[2, 'col3'] = 'Test'
v = df.values
i, j = np.where(np.isnan(v))
s = pd.Series(v[i, j], df.columns[j])
c = s.groupby(level=0).cumcount()
s.index = [c, s.index]
s.unstack(fill_value='-') # <-- don't fill to get NaN
col1 col2 col3 col4 col5 col6
0 ABC1 15.0 Test Large 345 US
1 - - 24RA - - -
Option 2mask
to make nulls then stack
to get rid of them
Or we could have
# This should work even if `'-'` are NaN
# but you can skip the `.mask(df == '-')`
s = df.mask(df == '-').stack().reset_index(0, drop=True)
c = s.groupby(level=0).cumcount()
s.index = [c, s.index]
s.unstack(fill_value='-')
col1 col2 col3 col4 col5 col6
0 ABC1 15.0 Test Large 345 US
1 - - 24RA - - -
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With