Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop rows with a 'question mark' value in any column in a pandas dataframe

I want to remove all rows (or take all rows without) a question mark symbol in any column. I also want to change the elements to float type.

Input:

X Y Z
0 1 ?
1 2 3
? ? 4
4 4 4
? 2 5

Output:

X Y Z
1 2 3
4 4 4

Preferably using pandas dataframe operations.

like image 395
Anonymous Avatar asked Feb 28 '16 12:02

Anonymous


People also ask

How do you drop rows in pandas based on column values?

Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.

How do I drop rows in pandas DataFrame based on condition?

To drop rows based on certain conditions, select the index of the rows which pass the specific condition and pass that index to the drop() method. In this code, (df['Unit_Price'] >400) & (df['Unit_Price'] < 600) is the condition to drop the rows.

How do I remove a specific value from a column in Python?

How do I remove a specific value from a DataFrame in Python? Specific rows and columns can be removed from a DataFrame object using the drop() instance method. The drop method can be specified of an axis – 0 for columns and 1 for rows.


2 Answers

You can try first find string ? in columns, create boolean mask and last filter rows - use boolean indexing. If you need convert columns to float, use astype:

print ~((df['X'] == '?' )  (df['Y'] == '?' ) | (df['Z'] == '?' ))
0    False
1     True
2    False
3     True
4    False
dtype: bool


df1 = df[~((df['X'] == '?' ) | (df['Y'] == '?' ) | (df['Z'] == '?' ))].astype(float)
print df1
   X  Y  Z
1  1  2  3
3  4  4  4

print df1.dtypes
X    float64
Y    float64
Z    float64
dtype: object

Or you can try:

df['X'] = pd.to_numeric(df['X'], errors='coerce')
df['Y'] = pd.to_numeric(df['Y'], errors='coerce')
df['Z'] = pd.to_numeric(df['Z'], errors='coerce')
print df
    X   Y   Z
0   0   1 NaN
1   1   2   3
2 NaN NaN   4
3   4   4   4
4 NaN   2   5
print ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() ))
0    False
1     True
2    False
3     True
4    False
dtype: bool

print df[ ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() )) ].astype(float)
   X  Y  Z
1  1  2  3
3  4  4  4

Better is use:

df = df[(df != '?').all(axis=1)]

Or:

df = df[~(df == '?').any(axis=1)]
like image 66
jezrael Avatar answered Sep 25 '22 08:09

jezrael


You can try replacing ? with null values

import numpy as np

data = df.replace("?", "np.Nan")

if you want to replace particular column try this:

data = df["column name"].replace("?", "np.Nan")
like image 39
Naidu Jithendra Avatar answered Sep 24 '22 08:09

Naidu Jithendra