Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to format a dataframe having many NaN values, join all rows to those not starting with NaN

I have the follwing df:

df = pd.DataFrame({
    'col1': [1, np.nan, np.nan, np.nan, 1, np.nan, np.nan, np.nan],
    'col2': [np.nan, 2, np.nan, np.nan, np.nan, 2, np.nan, np.nan],
    'col3': [np.nan, np.nan, 3, np.nan, np.nan, np.nan, 3, np.nan],
    'col4': [np.nan, np.nan, np.nan, 4, np.nan, np.nan, np.nan, 4]
    })

It has the following display:

    col1 col2 col3 col4
0   1.0 NaN NaN NaN
1   NaN 2.0 NaN NaN
2   NaN NaN 3.0 NaN
3   NaN NaN NaN 4.0
4   5.0 NaN NaN NaN
5   NaN 6.0 NaN NaN
6   NaN NaN 7.0 NaN
7   NaN NaN NaN 8.0

My goal is to keep all rows begining with float (not NaN value) and join to them the remaining ones.

The new_df I want to get is:

    col1 col2 col3 col4
0   1   2   3   4
4   5   6   7   8

Any help form your side will be highly appreciated (I upvote all answers).

Thank you!

like image 765
Khaled DELLAL Avatar asked Jan 31 '26 00:01

Khaled DELLAL


2 Answers

If need join first values per groups defined by non missing values in df['col1'] use:

df = (df.reset_index()
        .groupby(df['col1'].notna().cumsum())
        .first()
        .set_index('index'))
like image 110
jezrael Avatar answered Feb 01 '26 14:02

jezrael


Try this:

df.apply(lambda x: x.dropna().to_numpy())

Output:

   col1  col2  col3  col4
0   1.0   2.0   3.0   4.0
1   5.0   6.0   7.0   8.0

You can also, cast as integers:

df.apply(lambda x: x.dropna().to_numpy(dtype='int'))

Output:

   col1  col2  col3  col4
0     1     2     3     4
1     5     6     7     8
like image 24
Scott Boston Avatar answered Feb 01 '26 14:02

Scott Boston



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!