Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collapsing rows in a Pandas dataframe if all rows have only one value in their columns

I have following DF

         col1  |  col2   | col3   | col4   | col5  | col6
    0    -     |   15.0  |  -     |  -     |   -   |  -
    1    -     |   -     |  -     |  -     |   -   |  US
    2    -     |   -     |  -     |  Large |   -   |  -
    3    ABC1  |   -     |  -     |  -     |   -   |  -
    4    -     |   -     |  24RA  |  -     |   -   |  -
    5    -     |   -     |  -     |  -     |   345 |  -

I want to collapse rows into one as follows

    output DF:
         col1  |  col2    | col3   | col4   | col5  | col6
    0    ABC1  |   15.0   |  24RA  |  Large |   345 |  US

I do not want to iterate over columns but want to use pandas to achieve this.

like image 707
Test Test Avatar asked Jun 02 '17 01:06

Test Test


People also ask

How do I drop a row if all NaN Pandas?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

How do you drop a bunch of rows in Pandas?

To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.

How do you drop rows with Infinity Pandas?

Use the pandas. DataFrame. dropna() method to drop the rows with infinite values.

How do I flatten a row in Pandas?

Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.

How to collapse multiple columns in pandas?

Following steps are to be followed to collapse multiple columns in Pandas: Step #1: Load numpy and Pandas. Step #2: Create random data and use them to create a pandas dataframe. Step #3: Convert multiple lists into a single data frame, by creating a dictionary for each list with a name. Step #4: Then use Pandas dataframe into dict.

How to deal with rows in pandas Dataframe?

In order to deal with rows, we can perform basic operations on rows like selecting, deleting, adding and renmaing. Row Selection: Pandas provide a unique method to retrieve rows from a Data frame.DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.

How to display the number of rows and columns that pandas displays?

In order to display the number of rows and columns that Pandas displays by default, we can use the .get_option () function. This function takes a value and returns the provided option for that value. 'display.max_columns', which controls the number of columns to display

How do you collapse a list of data in Python?

1 Step #1: Load numpy and Pandas. 2 Step #2: Create random data and use them to create a pandas dataframe. 3 Step #3: Convert multiple lists into a single data frame, by creating a dictionary for each list with a name. 4 Step #4: Then use Pandas dataframe into dict. ... 5 Step #5: Specify which columns are to be collapsed. ...


Video Answer


1 Answers

Option 0
Super Simple

pd.concat([pd.Series(df[c].dropna().values, name=c) for c in df], axis=1)

   col1  col2  col3   col4   col5 col6
0  ABC1  15.0  24RA  Large  345.0   US

Can we handle more than one value per column?
Sure we can!

df.loc[2, 'col3'] = 'Test'

   col1  col2  col3   col4   col5 col6
0  ABC1  15.0  Test  Large  345.0   US
1   NaN   NaN  24RA    NaN    NaN  NaN

Option 1
Generalized solution using np.where like a surgeon

v = df.values
i, j = np.where(np.isnan(v))

s = pd.Series(v[i, j], df.columns[j])

c = s.groupby(level=0).cumcount()
s.index = [c, s.index]
s.unstack(fill_value='-')  # <-- don't fill to get NaN

   col1  col2  col3   col4 col5 col6
0  ABC1  15.0  24RA  Large  345   US

df.loc[2, 'col3'] = 'Test'

v = df.values
i, j = np.where(np.isnan(v))

s = pd.Series(v[i, j], df.columns[j])

c = s.groupby(level=0).cumcount()
s.index = [c, s.index]
s.unstack(fill_value='-')  # <-- don't fill to get NaN

   col1  col2  col3   col4 col5 col6
0  ABC1  15.0  Test  Large  345   US
1     -     -  24RA      -    -    -

Option 2
mask to make nulls then stack to get rid of them

Or we could have

# This should work even if `'-'` are NaN
# but you can skip the `.mask(df == '-')`
s = df.mask(df == '-').stack().reset_index(0, drop=True)
c = s.groupby(level=0).cumcount()
s.index = [c, s.index]
s.unstack(fill_value='-')

   col1  col2  col3   col4 col5 col6
0  ABC1  15.0  Test  Large  345   US
1     -     -  24RA      -    -    -
like image 59
piRSquared Avatar answered Sep 16 '22 12:09

piRSquared