Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to round/remove traling ".0" zeros in pandas column?

Tags:

I'm trying to see if I can remove the trailing zeros from this phone number column.

Example:

0 1      8.00735e+09 2      4.35789e+09 3      6.10644e+09 

The type in this column is an object, and I tried to round it but I am getting an error. I checked a couple of them I know they are in this format "8007354384.0", and want to get rid of the trailing zeros with the decimal point.

Sometimes I received in this format and sometimes I don't, they will be integer numbers. I would like to check if the phone column has a trailing zero, then remove it.

I have this code but I'm stuck on how to check for trailing zeros for each row.

data.ix[data.phone.str.contains('.0'), 'phone'] 

I get an error => *** ValueError: cannot index with vector containing NA / NaN values. I believe the issue is because some rows have empty data, which sometime I do receive. The code above should be able to skip an empty row.

Does anybody have any suggestions? I'm new to pandas but so far it's an useful library. Your help will be appreciated.

Note The provided example above, the first row has an empty data, which I do sometimes I get. Just to make sure this is not represented as 0 for phone number.

Also empty data is considered a string, so it's a mix of floats and string, if rows are empty.

like image 223
medev21 Avatar asked Feb 22 '17 22:02

medev21


People also ask

How do you get rid of trailing zeros in pandas?

To remove trailing zeros from a string in Python, the easiest way is to use the Python string rstrip() function. rstrip(), or “right strip”, removes a given character from the end of a string if they exist. By default, spaces are removed from the end of the string, but you can pass any character.

How do you get rid of trailing zeros after a decimal?

You can remove trailing zeros using TRIM() function.


2 Answers

use astype(np.int64)

s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09]) mask = pd.to_numeric(s).notnull() s.loc[mask] = s.loc[mask].astype(np.int64) s  0               1    8007350000 2    4357890000 3    6106440000 dtype: object 
like image 83
piRSquared Avatar answered Sep 18 '22 02:09

piRSquared


In Pandas/NumPy, integers are not allowed to take NaN values, and arrays/series (including dataframe columns) are homogeneous in their datatype --- so having a column of integers where some entries are None/np.nan is downright impossible.

EDIT:data.phone.astype('object') should do the trick; in this case, Pandas treats your column as a series of generic Python objects, rather than a specific datatype (e.g. str/float/int), at the cost of performance if you intend to run any heavy computations with this data (probably not in your case).

Assuming you want to keep those NaN entries, your approach of converting to strings is a valid possibility:

data.phone.astype(str).str.split('.', expand = True)[0]

should give you what you're looking for (there are alternative string methods you can use, such as .replace or .extract, but .split seems the most straightforward in this case).

Alternatively, if you are only interested in the display of floats (unlikely I'd suppose), you can do pd.set_option('display.float_format','{:.0f}'.format), which doesn't actually affect your data.

like image 25
Ken Wei Avatar answered Sep 17 '22 02:09

Ken Wei