Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting pandas column of comma-separated strings into integers

I have a data frame that contains a column with comma separated values. I would like to convert the string values in that column to integers.

I am newish to coding in general so a brief explanation of what is happening would be massively appreciated. If you have time.

I have tried the following code.

df['col3'].str.strip(',').astype(int)

df
col1 col2 col3
1    x    12,123
2    x    1,123
3    y    45,998

df
col1 col2 col3
1    x    12123
2    x    1123
3    y    45998
like image 339
DataPlankton Avatar asked Dec 21 '18 14:12

DataPlankton


People also ask

How do I convert an entire column to an int panda?

Use pandas DataFrame. astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy. int64 , numpy.

How do you convert commas to integers in Python?

To convert a comma-separated number to an integer:Use the str. replace() method to remove the commas from the string. Use the int() class to convert the string to an integer.

What is Tolist () in pandas?

tolist()[source] Return a list of the values. These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period) Returns list.

How do you break a comma separated string in a pandas column?

First, we'll split or “explode” our string of comma separated values into a Python list using the str. split() function. We'll append the str. split() function to the name of the original column and define the separator as a comma, then we'll assign the output of the result to a new column called models_list .


3 Answers

I think your solution should actually be:

df['col3'] = df.col3.str.split(',').str.join('').astype(int)

    col1 col2   col3
0     1    x  12123
1     2    x   1123
2     3    y  45998

As str.strip only strips from the left and right sides.

Explanation

  • str: Allows for vectorized string functions for Series
  • split: Will split each element in the list according to some pattern, , in this case
  • join: will join elements in the now Series of lists with a passed delimeter, '' here as you want to create ints.

And finally .astype(int) to turn each string into an integer

like image 110
yatu Avatar answered Oct 20 '22 05:10

yatu


There are already answers to this question but , i would like to add a another solution:

DataFrame:

>>> df
   col1 col2    col3
0     1    x  12,123
1     2    x   1,123
2     3    y  45,998

Try simplest by using str.replace method and you are all done:

>>> df['col3'] = df['col3'].str.replace(",", "")
# df['col3'] = df['col3'].str.replace(",", "").astype(int) <- cast to int
>>> df
   col1 col2   col3
0     1    x  12123
1     2    x   1123
2     3    y  45998

OR

another using df.replace along with regex method as Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.

>>> df['col3'] = df['col3'].replace(',', '', regex=True)
>>> df
   col1 col2   col3
0     1    x  12123
1     2    x   1123
2     3    y  45998
like image 29
Karn Kumar Avatar answered Oct 20 '22 05:10

Karn Kumar


Brief explanation:

df['col3'].str.strip(',').str.join('').astype(int)
  • df['col3'] generates a pandas.Series from the values of col3
  • _______.str can be understood as a cast-to-string, usually means you would like to use a string method to the contents of your series
  • _____.str.strip(',') uses the strip method: break a string into substrings, using the separator provided as the parameter used to distinguish when one substring ends and when the next one begins
  • _____.str.strip(',').str.join('') takes the substrings generated by the split and concatenates them together (effectively you're just removing the separator)
  • ____.astype(int) casts your result to an int

Credit to nixon on including the join to generate the actual desired output. Hope this helps, happy coding!

like image 2
Yuca Avatar answered Oct 20 '22 06:10

Yuca