I have a dataframe
df = pd.DataFrame(data=np.arange(10),columns=['v']).astype(float)
How to make sure that the numbers in v
are whole numbers?
I am very concerned about rounding/truncation/floating point representation errors
Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.
Pandas DataFrame all() Method. Pandas all() method is used to check whether all the elements of a DataFrame are zero or not. It returns either series or DataFrame containing True and False values, if the level parameter is specified then it returns DataFrame, Series otherwise.
Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.
astype(int)
Tentatively convert your column to int
and test with np.array_equal
:
np.array_equal(df.v, df.v.astype(int)) True
float.is_integer
You can use this python function in conjunction with an apply
:
df.v.apply(float.is_integer).all() True
Or, using python's all
in a generator comprehension, for space efficiency:
all(x.is_integer() for x in df.v) True
Here's a simpler, and probably faster, approach:
(df[col] % 1 == 0).all()
To ignore nulls:
(df[col].fillna(-9999) % 1 == 0).all()
If you want to check multiple float columns in your dataframe, you can do the following:
col_should_be_int = df.select_dtypes(include=['float']).applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = df.loc[:, float_to_int_cols].astype(int)
Keep in mind that a float column, containing all integers will not get selected if it has np.NaN
values. To cast float columns with missing values to integer, you need to fill/remove missing values, for example, with median imputation:
float_cols = df.select_dtypes(include=['float'])
float_cols = float_cols.fillna(float_cols.median().round()) # median imputation
col_should_be_int = float_cols.applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = float_cols[float_to_int_cols].astype(int)
For completeness, Pandas v1.0+ offer the convert_dtypes()
utility, that (among 3 other conversions) performs the requested operation for all dataframe-columns (or series) containing only integer numbers.
If you wanted to limit the conversion to a single column only, you could do the following:
>>> df.dtypes # inspect previous dtypes
v float64
>>> df["v"] = df["v"].convert_dtype()
>>> df.dtypes # inspect converted dtypes
v Int64
On 27 331 625 rows it works well. Time : 1.3sec
df['is_float'] = df[field_fact_qty]!=df[field_fact_qty].astype(int)
This way took Time : 4.9s
df[field_fact_qty].apply(lambda x : (x.is_integer()))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With