I am working on several taxi datasets. I have used pandas to concat all the dataset into a single dataframe.
My dataframe looks something like this.
675 1039 #and rest 125 taxis
longitude latitude longitude latitude
date
2008-02-02 13:31:21 116.56359 40.06489 Nan Nan
2008-02-02 13:31:51 116.56486 40.06415 Nan Nan
2008-02-02 13:32:21 116.56855 40.06352 116.58243 39.6313
2008-02-02 13:32:51 116.57127 40.06324 Nan Nan
2008-02-02 13:33:21 116.57120 40.06328 116.55134 39.6313
2008-02-02 13:33:51 116.57121 40.06329 116.55126 39.6123
2008-02-02 13:34:21 Nan Nan 116.55134 39.5123
where 675,1039 are the taxi ids. Basically there are totally 127 taxis having their corresponding latitudes and longitudes columned up.
I have several ways to extract not-null values for a row.
df.ix[k,df.columns[np.isnan(df.irow(0))!=1]]
(or)
df.irow(0)[np.isnan(df.irow(0))!=1]
(or)
df.irow(0)[np.where(df.irow(0)[df.columns].notnull())[0]]
any of the above commands will return,
675 longitude 116.56359
latitude 40.064890
4549 longitude 116.34642
latitude 39.96662
Name: 2008-02-02 13:31:21
now i want to extract all the notnull values from first few rows(say from row 1 to row 6).
how do i do that?
i can probably loop it up. But i want a non-looped way of doing it.
Any help, suggestions are welcome. Thanks in adv! :)
notnull. Detect non-missing values for an array-like object. This function takes a scalar or array-like object and indictates whether values are valid (not missing, which is NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).
df.ix[1:6].dropna(axis=1)
As a heads up, irow
will be deprecated in the next release of pandas. New methods, with clearer usage, replace it.
http://pandas.pydata.org/pandas-docs/dev/indexing.html#deprecations
In 0.11 (0.11rc1 is out now), this is very easy using .iloc
to first select the first 6 rows, then dropna drops any row with a nan
(you can also pass some options to dropna to control exactly which columns you want considered)
I realized you want 1:6, I did 0:6 in my answer....
In [8]: df = DataFrame(randn(10,3),columns=list('ABC'),index=date_range('20130101',periods=10))
In [9]: df.ix[6,'A'] = np.nan
In [10]: df.ix[6,'B'] = np.nan
In [11]: df.ix[2,'A'] = np.nan
In [12]: df.ix[4,'B'] = np.nan
In [13]: df.iloc[0:6]
Out[13]:
A B C
2013-01-01 0.442692 -0.109415 -0.038182
2013-01-02 1.217950 0.006681 -0.067752
2013-01-03 NaN -0.336814 -1.771431
2013-01-04 -0.655948 0.484234 1.313306
2013-01-05 0.096433 NaN 1.658917
2013-01-06 1.274731 1.909123 -0.289111
In [14]: df.iloc[0:6].dropna()
Out[14]:
A B C
2013-01-01 0.442692 -0.109415 -0.038182
2013-01-02 1.217950 0.006681 -0.067752
2013-01-04 -0.655948 0.484234 1.313306
2013-01-06 1.274731 1.909123 -0.289111
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With