removing particular rows from DataFrame in python pandas

Tags:

I have a large .txt with data in bad formats. I would like to remove some rows and convert rest of data to float numbers. I would like to remove rows with 'X' or 'XX', The rest I should convert to float, number like 4;00.1 should be converted to 4.001 The file looks like this sample:

0,1,10/09/2012,3:01,4;09.1,5,6,7,8,9,10,11
1,-0.581586,11/09/2012,-1:93,0;20.3,739705,,0.892921,5,,6,7
2,XX,10/09/2012,3:04,4;76.0,0.183095,-0.057214,-0.504856,NaN,0.183095,12
3,-0.256051,10/09/2012,9:65,1;54.9,483293,0.504967,0.074442,-1.716287,7,0.504967,0.504967
4,-0.728092,11/09/2012,0:78,1;53.4,232247,4.556,0.328062,1.382914,NaN,4.556,4
5,4,11/09/2012,NaN,NaN,6.0008,NaN,NaN,NaN,6.000800,6.000000,6.000800
6,X,11/09/2012,X,X,5,X,8,2,1,17.000000,33.000000
7,,11/09/2012,,,,,,6.000000,5.000000,2.000000,2.000000
8,4,11/09/2012,7:98,3;04.5,5,6,3,7.000000,3.000000,3.000000,2
9,6,11/09/2012,2:21,4;67.2,5,2,2,7,3,8.000000,4.000000

I read it to DataFrame and choose rows

from pandas import *
from csv import *
fileName = '~/data.txt'
colName = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']
df = DataFrame(read_csv(fileName, names=colName))
print df[df['b'].isin(['X','XX',None,'NaN'])].to_string()

An output from last last line gives me only:

>>> print df[df['b'].isin(['X','XX',None,'NaN'])].to_string()
    b           c     d       e         f          g         h   i         j   k   l
a                                                                                   
2  XX  10/09/2012  3:04  4;76.0  0.183095  -0.057214 -0.504856 NaN  0.183095  12 NaN
6   X  11/09/2012     X       X  5.000000          X  8.000000   2  1.000000  17  33

Does not pick up row 7, and I would like to go through all df not only one column (original file is very large).

At the moment for conversion I use as below, but need remove unwanted rows first to apply it to all df.

convert1 = lambda x : x.replace('.', '')
convert2 = lambda x : float(x.replace(';', '.'))
newNumber = convert2(convert1(df['e'][0]))

After choosing rows I would like to remove them from df, I try df.pop() but it works only for column not for rows. I try to name rows but don't luck. In this particular .txt I should finish with a new df from rows [0,3,8,9] with column 'c' as a date format, 'd' as a time format and the rest as the float. I try to figure it out for quite a while now, but do not know where to move, is it possible in pandas (probably should be) or do I need to change to ndarray or anything else? Thanks for your advise

878

asked Sep 22 '12 22:09

tomasz74

1 Answers

The problem with your original filter is it checks for 'NaN' rather than numpy.nan, which is what empty strings are parsed as by default. If you want to filter all the columns so you only get rows where no element is 'X' or 'XX', do something like this:

In [45]: names = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']

In [46]: df = pd.read_csv(StringIO(data), header=None, names=names)

In [47]: mask = df.applymap(lambda x: x in ['X', 'XX', None, np.nan])

In [48]: df[-mask.any(axis=1)]
Out[48]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 9
Data columns:
a    5  non-null values
b    5  non-null values
c    5  non-null values
d    5  non-null values
e    5  non-null values
f    5  non-null values
g    5  non-null values
h    5  non-null values
i    5  non-null values
j    4  non-null values
k    5  non-null values
l    5  non-null values
dtypes: float64(6), int64(1), object(5)

166

answered Nov 15 '22 08:11

Chang She

Related questions
                            
                                How do I send a raw ethernet frame in python?
                            
                                Putting a Date object into MongoDB, getting back a float when querying with pymongo
                            
                                Create binary PBM/PGM/PPM
                            
                                Authentication and python Requests
                            
                                Find location of slice in numpy array
                            
                                cluster computing using starcluster and ipython on AWS
                            
                                How to use Py_AddPendingCall
                            
                                Data containers: class vs dictionary
                            
                                Django ORM & hstore : counting unique values of a key
                            
                                Simple 2d surface with arrow in python?
                            
                                Is there a Java equivalent for Python's map function?
                            
                                Await an async function in Python debugger
                            
                                How do I use Python's httplib to send a POST to a URL, with a dictionary of parameters?
                            
                                Does Python's time.time() return a timestamp in UTC? [duplicate]
                            
                                asyncio - How can coroutines be used in signal handlers?
                            
                                Read in file - change contents - write out to same file
                            
                                How do I put a constraint on SciPy curve fit?
                            
                                Python sys.argv and argparse
                            
                                how to use the href attribute in django templates
                            
                                Booleans in ConfigParser always return True

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

removing particular rows from DataFrame in python pandas

Tags:

python

pandas

tomasz74

People also ask

1 Answers

Chang She

Recent Activity

Donate For Us