I have a data frame that contains a column with comma separated values. I would like to convert the string values in that column to integers.
I am newish to coding in general so a brief explanation of what is happening would be massively appreciated. If you have time.
I have tried the following code.
df['col3'].str.strip(',').astype(int)
df
col1 col2 col3
1 x 12,123
2 x 1,123
3 y 45,998
df
col1 col2 col3
1 x 12123
2 x 1123
3 y 45998
Use pandas DataFrame. astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy. int64 , numpy.
To convert a comma-separated number to an integer:Use the str. replace() method to remove the commas from the string. Use the int() class to convert the string to an integer.
tolist()[source] Return a list of the values. These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period) Returns list.
First, we'll split or “explode” our string of comma separated values into a Python list using the str. split() function. We'll append the str. split() function to the name of the original column and define the separator as a comma, then we'll assign the output of the result to a new column called models_list .
I think your solution should actually be:
df['col3'] = df.col3.str.split(',').str.join('').astype(int)
col1 col2 col3
0 1 x 12123
1 2 x 1123
2 3 y 45998
As str.strip
only strips from the left and right sides.
Explanation
str
: Allows for vectorized string functions for Seriessplit
: Will split each element in the list according to some pattern, ,
in this casejoin
: will join elements in the now Series of lists with a passed delimeter, ''
here as you want to create ints
.And finally .astype(int)
to turn each string into an integer
There are already answers to this question but , i would like to add a another solution:
DataFrame:
>>> df
col1 col2 col3
0 1 x 12,123
1 2 x 1,123
2 3 y 45,998
Try simplest by using str.replace
method and you are all done:
>>> df['col3'] = df['col3'].str.replace(",", "")
# df['col3'] = df['col3'].str.replace(",", "").astype(int) <- cast to int
>>> df
col1 col2 col3
0 1 x 12123
1 2 x 1123
2 3 y 45998
OR
another using df.replace
along with regex method as Regex substitution is performed under the hood with re.sub
. The rules for substitution for re.sub
are the same.
>>> df['col3'] = df['col3'].replace(',', '', regex=True)
>>> df
col1 col2 col3
0 1 x 12123
1 2 x 1123
2 3 y 45998
Brief explanation:
df['col3'].str.strip(',').str.join('').astype(int)
df['col3']
generates a pandas.Series
from the values of col3
_______.str
can be understood as a cast-to-string, usually means you would like to use a string method to the contents of your series_____.str.strip(',')
uses the strip
method: break a string into substrings, using the separator provided as the parameter used to distinguish when one substring ends and when the next one begins_____.str.strip(',').str.join('')
takes the substrings generated by the split and concatenates them together (effectively you're just removing the separator)____.astype(int)
casts your result to an intCredit to nixon on including the join
to generate the actual desired output. Hope this helps, happy coding!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With