I am operating the UCI data sets, some of them contains "?" in lines. For example:
56.0,1.0,2.0,130.0,221.0,0.0,2.0,163.0,0.0,0.0,1.0,0.0,7.0,0
58.0,1.0,2.0,125.0,220.0,0.0,0.0,144.0,0.0,0.4,2.0,?,7.0,0
57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0,2.0,1.0,3.0,1
38.0,1.0,3.0,138.0,175.0,0.0,0.0,173.0,0.0,0.0,1.0,?,3.0,0
I firstly use numpy.loadtxt()
to load file, and try to delete the lines with "?" using line.contains('?')
, but got error with the type.
Then I use pandas.read_csv
, however, I still have no easy way to delete all lines contains a specific letter "?".
Is there any easy way to clean the data? I need a float type data file without any "?" in it. Thanks~
You can do this with Pandas.
import pandas as pd
df = pd.read_csv('file.csv')
df = df.replace('?', pd.np.nan)
df = df.dropna()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With