Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete a line contains "?" in Python

I am operating the UCI data sets, some of them contains "?" in lines. For example:

56.0,1.0,2.0,130.0,221.0,0.0,2.0,163.0,0.0,0.0,1.0,0.0,7.0,0
58.0,1.0,2.0,125.0,220.0,0.0,0.0,144.0,0.0,0.4,2.0,?,7.0,0
57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0,2.0,1.0,3.0,1
38.0,1.0,3.0,138.0,175.0,0.0,0.0,173.0,0.0,0.0,1.0,?,3.0,0

I firstly use numpy.loadtxt() to load file, and try to delete the lines with "?" using line.contains('?'), but got error with the type.

Then I use pandas.read_csv, however, I still have no easy way to delete all lines contains a specific letter "?".

Is there any easy way to clean the data? I need a float type data file without any "?" in it. Thanks~

like image 732
flyingmouse Avatar asked Feb 08 '23 10:02

flyingmouse


1 Answers

You can do this with Pandas.

import pandas as pd

df = pd.read_csv('file.csv')
df = df.replace('?', pd.np.nan)
df = df.dropna()
like image 197
ComputerFellow Avatar answered Feb 12 '23 11:02

ComputerFellow