My task is to read data from excel to dataframe. The data is a bit messy and to clean that up I've done: <pre class="prettyprint"><code>df_1 = pd.read_excel(offers[0]) df_1 = df_1.rename(columns={'Наименование [Дата Файла: 29.05.2019 время: 10:29:42 ]':'good_name', 'Штрихкод':'barcode', 'Цена шт. руб.':'price', 'Остаток': 'balance' }) df_1 = df_1[new_columns] # I don't know why but without replacing NaN with another char code doesn't work df_1.barcode = df_1.barcode.fillna('_') # remove all non-numeric characters df_1.barcode = df_1.barcode.apply(lambda row: re.sub('[^0-9]', '', row)) # convert str to numeric df_1.barcode = pd.to_numeric(df_1.barcode, downcast='integer').fillna(0) df_1.head() </code></pre> It returns column barcode with type float64 (why so?) <pre class="prettyprint"><code>0 0.000000e+00 1 7.613037e+12 2 7.613037e+12 3 7.613034e+12 4 7.613035e+12 Name: barcode, dtype: float64 </code></pre> Then I try to convert that column to integer. <pre class="prettyprint"><code>df_1.barcode = df_1.barcode.astype(int) </code></pre> But I keep getting silly negative numbers. <pre class="prettyprint"><code>df_1.barcode[0:5] 0 0 1 -2147483648 2 -2147483648 3 -2147483648 4 -2147483648 Name: barcode, dtype: int32 </code></pre> Thanks to @Will and @micric eventually I've got a solution. <pre class="prettyprint"><code>df_1 = pd.read_excel(offers[0]) df_1 = df_1[new_columns] # replacing NaN with 0, it'll help to convert the column explicitly to dtype integer df_1.barcode = df_1.barcode.fillna('0') # remove all non-numeric characters df_1.barcode = df_1.barcode.apply(lambda row: re.sub('[^0-9]', '', row)) # convert str to integer df_1.barcode = pd.to_numeric(df_1.barcode, downcast='integer') </code></pre> Resume: <ul> <li>pd.to_numeric converts NaN to float64. As a result from column with both NaN and not-Nan values we should expect column dtype float64.</li> <li>Check size of number you're dealing with. int32 has its limit, which is 2**32 = 4294967296. Thanks a lot for your help, guys!</li> </ul>

Many questions in one. So your expected dtype... <pre class="prettyprint"><code>pd.to_numeric(df_1.barcode, downcast='integer').fillna(0) </code></pre> <code>pd.to_numeric</code> downcast to integer would give you an integer, however, you have NaNs in your data and pandas needs to use a float64 type to represent NaNs

Pandas, astype(int) applied to float column returns negative numbers

Tags:

type-conversion

pandas

My task is to read data from excel to dataframe. The data is a bit messy and to clean that up I've done:

df_1 = pd.read_excel(offers[0])
df_1 = df_1.rename(columns={'Наименование [Дата Файла: 29.05.2019 время: 10:29:42 ]':'good_name', 
                     'Штрихкод':'barcode', 
                     'Цена шт. руб.':'price',
                     'Остаток': 'balance'
                    })
df_1 = df_1[new_columns]
# I don't know why but without replacing NaN with another char code doesn't work
df_1.barcode = df_1.barcode.fillna('_')
# remove all non-numeric characters
df_1.barcode = df_1.barcode.apply(lambda row: re.sub('[^0-9]', '', row))
# convert str to numeric
df_1.barcode = pd.to_numeric(df_1.barcode, downcast='integer').fillna(0)
df_1.head()

It returns column barcode with type float64 (why so?)

0    0.000000e+00
1    7.613037e+12
2    7.613037e+12
3    7.613034e+12
4    7.613035e+12
Name: barcode, dtype: float64

Then I try to convert that column to integer.

df_1.barcode = df_1.barcode.astype(int)

But I keep getting silly negative numbers.

df_1.barcode[0:5]
0             0
1   -2147483648
2   -2147483648
3   -2147483648
4   -2147483648

Name: barcode, dtype: int32

Thanks to @Will and @micric eventually I've got a solution.

df_1 = pd.read_excel(offers[0])
df_1 = df_1[new_columns]
# replacing NaN with 0, it'll help to convert the column explicitly to dtype integer
df_1.barcode = df_1.barcode.fillna('0')
# remove all non-numeric characters
df_1.barcode = df_1.barcode.apply(lambda row: re.sub('[^0-9]', '', row))
# convert str to integer
df_1.barcode = pd.to_numeric(df_1.barcode, downcast='integer')

Resume:

pd.to_numeric converts NaN to float64. As a result from column with both NaN and not-Nan values we should expect column dtype float64.
Check size of number you're dealing with. int32 has its limit, which is 2**32 = 4294967296. Thanks a lot for your help, guys!

975

asked May 31 '19 08:05

Sergey Solod

3 Answers

That number is a 32 bit lower limit. Your number is out of the int32 range you are trying to use, so it returns you the limit (notice that 2**32 = 4294967296, divided by 2 2147483648 that is your number).

You should use astype(int64) instead.

answered Oct 11 '22 00:10

micric

I ran into the same problem as OP, using

astype(np.int64)

solved mine, see the link here.

I like this solution because it's consistent with my habit of changing the column type of pandas column, maybe someone could check the performance of these solutions.

answered Oct 11 '22 02:10

Jason Goal

Many questions in one.

So your expected dtype...

pd.to_numeric(df_1.barcode, downcast='integer').fillna(0)

pd.to_numeric downcast to integer would give you an integer, however, you have NaNs in your data and pandas needs to use a float64 type to represent NaNs

answered Oct 11 '22 02:10

Will

Related questions
                            
                                how to get unique values in all columns in pandas data frame
                            
                                pandas corr and corrwith very slow
                            
                                Pandas find last non NAN value
                            
                                Calculate count of all the elements in nested list
                            
                                Add RandomForestClassifier Predict_Proba Results to Original Dataframe
                            
                                Python rolling Sharpe ratio with Pandas or NumPy
                            
                                Concatenate strings based on inner join
                            
                                seaborn multiple variables group bar plot
                            
                                Regression by group in python pandas
                            
                                Appending rows to empty DataFrame not working
                            
                                Shift time in multi-index to merge
                            
                                Trying to change a single value in pandas dataframe
                            
                                python - No module named dill while using pickle.load()
                            
                                Pandas merge with duplicated key - removing duplicated rows or preventing it's creation
                            
                                Convert pandas dataframe to tuple of tuples
                            
                                Does the quantile() function in Pandas ignore NaN?
                            
                                Speeding up loop when normalizing Pandas data
                            
                                Merge two DataFrames based on columns and values of a specific column with Pandas in Python 3.x
                            
                                Convert pandas dataframe to directed networkx multigraph
                            
                                Create a new column with the minimum of other columns on same row

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With