Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if float pandas column contains only integer numbers?

I have a dataframe

df = pd.DataFrame(data=np.arange(10),columns=['v']).astype(float)

How to make sure that the numbers in v are whole numbers? I am very concerned about rounding/truncation/floating point representation errors

like image 400
00__00__00 Avatar asked Mar 13 '18 06:03

00__00__00


People also ask

How do you check if a column has numeric values in pandas?

Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.

How do you check if a column has 0 in pandas?

Pandas DataFrame all() Method. Pandas all() method is used to check whether all the elements of a DataFrame are zero or not. It returns either series or DataFrame containing True and False values, if the level parameter is specified then it returns DataFrame, Series otherwise.

How do you check if a column contains a character pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.


5 Answers

Comparison with astype(int)

Tentatively convert your column to int and test with np.array_equal:

np.array_equal(df.v, df.v.astype(int)) True 

float.is_integer

You can use this python function in conjunction with an apply:

df.v.apply(float.is_integer).all() True 

Or, using python's all in a generator comprehension, for space efficiency:

all(x.is_integer() for x in df.v) True 
like image 161
cs95 Avatar answered Sep 16 '22 21:09

cs95


Here's a simpler, and probably faster, approach:

(df[col] % 1  == 0).all()

To ignore nulls:

(df[col].fillna(-9999) % 1  == 0).all()
like image 20
scott Avatar answered Sep 17 '22 21:09

scott


If you want to check multiple float columns in your dataframe, you can do the following:

col_should_be_int = df.select_dtypes(include=['float']).applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = df.loc[:, float_to_int_cols].astype(int)

Keep in mind that a float column, containing all integers will not get selected if it has np.NaN values. To cast float columns with missing values to integer, you need to fill/remove missing values, for example, with median imputation:

float_cols = df.select_dtypes(include=['float'])
float_cols = float_cols.fillna(float_cols.median().round()) # median imputation
col_should_be_int = float_cols.applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = float_cols[float_to_int_cols].astype(int)
like image 23
mgoldwasser Avatar answered Sep 17 '22 21:09

mgoldwasser


For completeness, Pandas v1.0+ offer the convert_dtypes() utility, that (among 3 other conversions) performs the requested operation for all dataframe-columns (or series) containing only integer numbers.

If you wanted to limit the conversion to a single column only, you could do the following:

>>> df.dtypes          # inspect previous dtypes
v                      float64

>>> df["v"] = df["v"].convert_dtype()
>>> df.dtypes          # inspect converted dtypes
v                      Int64
like image 26
ankostis Avatar answered Sep 18 '22 21:09

ankostis


On 27 331 625 rows it works well. Time : 1.3sec

df['is_float'] = df[field_fact_qty]!=df[field_fact_qty].astype(int)

This way took Time : 4.9s

df[field_fact_qty].apply(lambda x : (x.is_integer()))
like image 44
Nicoolasens Avatar answered Sep 18 '22 21:09

Nicoolasens