To extract non-nan values from multiple rows in a pandas dataframe

Tags:

I am working on several taxi datasets. I have used pandas to concat all the dataset into a single dataframe.

My dataframe looks something like this.

                     675                       1039                #and rest 125 taxis
                     longitude     latitude    longitude    latitude
date
2008-02-02 13:31:21  116.56359  40.06489       Nan          Nan
2008-02-02 13:31:51  116.56486  40.06415       Nan          Nan
2008-02-02 13:32:21  116.56855  40.06352       116.58243    39.6313
2008-02-02 13:32:51  116.57127  40.06324       Nan          Nan
2008-02-02 13:33:21  116.57120  40.06328       116.55134    39.6313
2008-02-02 13:33:51  116.57121  40.06329       116.55126    39.6123
2008-02-02 13:34:21  Nan        Nan            116.55134    39.5123

where 675,1039 are the taxi ids. Basically there are totally 127 taxis having their corresponding latitudes and longitudes columned up.

I have several ways to extract not-null values for a row.

df.ix[k,df.columns[np.isnan(df.irow(0))!=1]]
              (or)
df.irow(0)[np.isnan(df.irow(0))!=1]
              (or)
df.irow(0)[np.where(df.irow(0)[df.columns].notnull())[0]]

any of the above commands will return,

675   longitude    116.56359
      latitude     40.064890 
4549  longitude    116.34642
      latitude      39.96662
Name: 2008-02-02 13:31:21

now i want to extract all the notnull values from first few rows(say from row 1 to row 6).

how do i do that?

i can probably loop it up. But i want a non-looped way of doing it.

Any help, suggestions are welcome. Thanks in adv! :)

389

asked Apr 15 '13 13:04

2 Answers

df.ix[1:6].dropna(axis=1)

As a heads up, irow will be deprecated in the next release of pandas. New methods, with clearer usage, replace it.

http://pandas.pydata.org/pandas-docs/dev/indexing.html#deprecations

139

answered Oct 11 '22 19:10

In 0.11 (0.11rc1 is out now), this is very easy using .iloc to first select the first 6 rows, then dropna drops any row with a nan (you can also pass some options to dropna to control exactly which columns you want considered)

I realized you want 1:6, I did 0:6 in my answer....

In [8]: df = DataFrame(randn(10,3),columns=list('ABC'),index=date_range('20130101',periods=10))

In [9]: df.ix[6,'A'] = np.nan

In [10]: df.ix[6,'B'] = np.nan

In [11]: df.ix[2,'A'] = np.nan

In [12]: df.ix[4,'B'] = np.nan

In [13]: df.iloc[0:6]
Out[13]: 
                   A         B         C
2013-01-01  0.442692 -0.109415 -0.038182
2013-01-02  1.217950  0.006681 -0.067752
2013-01-03       NaN -0.336814 -1.771431
2013-01-04 -0.655948  0.484234  1.313306
2013-01-05  0.096433       NaN  1.658917
2013-01-06  1.274731  1.909123 -0.289111

In [14]: df.iloc[0:6].dropna()
Out[14]: 
                   A         B         C
2013-01-01  0.442692 -0.109415 -0.038182
2013-01-02  1.217950  0.006681 -0.067752
2013-01-04 -0.655948  0.484234  1.313306
2013-01-06  1.274731  1.909123 -0.289111

answered Oct 11 '22 20:10

Jeff

Related questions
                            
                                Running the Python Code on Hadoop Failed
                            
                                Generalizing matrix transpose in numpy
                            
                                Python threading and semaphore
                            
                                Tornado PUT Request Missing Body
                            
                                Epoch Seconds to Date Conversion on a Limited Embedded Device
                            
                                How to include package sub-folders in my project distribution?
                            
                                Numpy-MKL for OS X
                            
                                Shape of array python
                            
                                python how to extract operators from string
                            
                                delete the first element in subview of a matrix
                            
                                doubly Linked list iterator python
                            
                                Paramiko: Creating a PKey from a public key string
                            
                                sqlalchemy.exc.ProgrammingError: (ProgrammingError) can't adapt type 'UUID'
                            
                                Kombu in non-blocking way
                            
                                Tkinter: set StringVar after <Key> event, including the key pressed
                            
                                Prevent Rounding to Zero in Python
                            
                                How to decorate a Python object with a mutex
                            
                                python numpy.convolve to solve convolution integral with limits from 0 to t instead -t to t
                            
                                Python Numpy recarray sort bidirectional
                            
                                Python: importing a module that imports a module

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

To extract non-nan values from multiple rows in a pandas dataframe

Tags:

python

pandas

dataframe

numpy

python-2.7

user2179627

People also ask

2 Answers

Dan Allan

Jeff

Recent Activity

Donate For Us