Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pop rows from a dataframe?

Tags:

python

pandas

I found the documentation for pandas.DataFrame.pop, but after trying it and examining the source code, it does not seem to do what I want.

If I make a dataframe like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan


>>> df
          0         1         2         3         4         5
0  0.772762 -0.442657  1.245988  1.102018 -0.740836  1.685598
1 -0.387922       NaN -1.215723 -0.106875  0.499110  0.338759
2  0.567631       NaN -0.353032 -0.099011 -0.698925 -1.348966
3  1.320849  1.084405 -1.296177  0.681111 -1.941855 -0.950346
4 -0.026818 -1.933629 -0.693964  1.116673  0.392217  1.280808
5 -1.249192 -0.035932 -1.330916       NaN -0.135720 -0.506016
6  0.406344  1.416579  0.122019  0.648851 -0.305359 -1.253580
7 -0.092440 -0.243593  0.468463 -1.689485  0.667804       NaN
8 -0.110819 -0.627777 -0.302116  0.630068  2.567923       NaN
9  1.884069 -0.393420 -0.950275  0.151182 -1.122764  0.502117

If I want to remove selected rows and assign them to a separate object in one step, I would want a pop behavior, like this:

# rows in column 5 which have NaN values
>>> df[df[5].isnull()].index
Int64Index([7, 8], dtype='int64')

# remove them from the dataframe, assign them to a separate object
>>> nan_rows = df.pop(df[df[5].isnull()].index)

However, this does not appear to be supported. Instead, it seems like I am forced to do this in two separate steps, which seems a bit inelegant.

# get the NaN rows
>>> nan_rows = df[df[5].isnull()]

>>> nan_rows
          0         1         2         3         4   5
7 -0.092440 -0.243593  0.468463 -1.689485  0.667804 NaN
8 -0.110819 -0.627777 -0.302116  0.630068  2.567923 NaN

# remove from orignal df
>>> df = df.drop(nan_rows.index)

>>> df
          0         1         2         3         4         5
0  0.772762 -0.442657  1.245988  1.102018 -0.740836  1.685598
1 -0.387922       NaN -1.215723 -0.106875  0.499110  0.338759
2  0.567631       NaN -0.353032 -0.099011 -0.698925 -1.348966
3  1.320849  1.084405 -1.296177  0.681111 -1.941855 -0.950346
4 -0.026818 -1.933629 -0.693964  1.116673  0.392217  1.280808
5 -1.249192 -0.035932 -1.330916       NaN -0.135720 -0.506016
6  0.406344  1.416579  0.122019  0.648851 -0.305359 -1.253580
9  1.884069 -0.393420 -0.950275  0.151182 -1.122764  0.502117

Is there a one-step method built-in? Or is this the way you're 'supposed' to do it?

like image 578
user5359531 Avatar asked Feb 16 '17 22:02

user5359531


People also ask

How do you pop rows in a data frame?

You can use the pop() function to quickly remove a column from a pandas DataFrame.

How do I remove rows from a DataFrame in python?

To delete a row from a DataFrame, use the drop() method and set the index label as the parameter.

How do I pop multiple columns in a data frame?

First, slice df (step 1), and then drop those columns (step 2). This is still a two step process, but you're doing it in one line. Defining df and then running the command df2 = df[['c', 'd']].


2 Answers

pop source code:

    def pop(self, item):
        """
        Return item and drop from frame. Raise KeyError if not found.
        """
        result = self[item]
        del self[item]
        try:
            result._reset_cacher()
        except AttributeError:
            pass

        return result
File:      c:\python\lib\site-packages\pandas\core\generic.py

del definitely won't work if item is not a simple column name. Pass a simple column name, or do it in two steps.

like image 94
Zeugma Avatar answered Sep 19 '22 15:09

Zeugma


Since you can pop columns, you can take transpose of the dataframe and pop its columns, ie. the rows of the original df like this. Here is the original df.

    import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, size=(3, 3)), columns = ['a', 'b', 'c'])

print(df)
   a  b  c
0  4  9  4
1  5  5  8
2  5  7  4

Then you take transpose of it and pop column 0 which is the row 0 of the original df.

df_t = df.T
popped_row = df_t.pop(0)

Now you have the popped row

print(popped_row)
a    4
b    9
c    4
Name: 0, dtype: int32

And then you have the original dataframe without the first row.

df = df_t.T

print(df)
   a  b  c
1  5  5  8
2  5  7  4
like image 26
MattiH Avatar answered Sep 20 '22 15:09

MattiH