I want to remove all rows (or take all rows without) a question mark symbol in any column. I also want to change the elements to float type.
Input:
X Y Z
0 1 ?
1 2 3
? ? 4
4 4 4
? 2 5
Output:
X Y Z
1 2 3
4 4 4
Preferably using pandas dataframe operations.
Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.
To drop rows based on certain conditions, select the index of the rows which pass the specific condition and pass that index to the drop() method. In this code, (df['Unit_Price'] >400) & (df['Unit_Price'] < 600) is the condition to drop the rows.
How do I remove a specific value from a DataFrame in Python? Specific rows and columns can be removed from a DataFrame object using the drop() instance method. The drop method can be specified of an axis – 0 for columns and 1 for rows.
You can try first find string ?
in columns, create boolean mask and last filter rows - use boolean indexing. If you need convert columns to float
, use astype
:
print ~((df['X'] == '?' ) (df['Y'] == '?' ) | (df['Z'] == '?' ))
0 False
1 True
2 False
3 True
4 False
dtype: bool
df1 = df[~((df['X'] == '?' ) | (df['Y'] == '?' ) | (df['Z'] == '?' ))].astype(float)
print df1
X Y Z
1 1 2 3
3 4 4 4
print df1.dtypes
X float64
Y float64
Z float64
dtype: object
Or you can try:
df['X'] = pd.to_numeric(df['X'], errors='coerce')
df['Y'] = pd.to_numeric(df['Y'], errors='coerce')
df['Z'] = pd.to_numeric(df['Z'], errors='coerce')
print df
X Y Z
0 0 1 NaN
1 1 2 3
2 NaN NaN 4
3 4 4 4
4 NaN 2 5
print ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() ))
0 False
1 True
2 False
3 True
4 False
dtype: bool
print df[ ((df['X'].notnull() ) & (df['Y'].notnull() ) & (df['Z'].notnull() )) ].astype(float)
X Y Z
1 1 2 3
3 4 4 4
Better is use:
df = df[(df != '?').all(axis=1)]
Or:
df = df[~(df == '?').any(axis=1)]
You can try replacing ?
with null values
import numpy as np
data = df.replace("?", "np.Nan")
if you want to replace particular column try this:
data = df["column name"].replace("?", "np.Nan")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With