Converting pandas column of comma-separated strings into integers

Tags:

I have a data frame that contains a column with comma separated values. I would like to convert the string values in that column to integers.

I am newish to coding in general so a brief explanation of what is happening would be massively appreciated. If you have time.

I have tried the following code.

df['col3'].str.strip(',').astype(int)

df
col1 col2 col3
1    x    12,123
2    x    1,123
3    y    45,998

df
col1 col2 col3
1    x    12123
2    x    1123
3    y    45998

339

asked Dec 21 '18 14:12

DataPlankton

3 Answers

I think your solution should actually be:

df['col3'] = df.col3.str.split(',').str.join('').astype(int)

    col1 col2   col3
0     1    x  12123
1     2    x   1123
2     3    y  45998

As str.strip only strips from the left and right sides.

Explanation

str: Allows for vectorized string functions for Series
split: Will split each element in the list according to some pattern, , in this case
join: will join elements in the now Series of lists with a passed delimeter, '' here as you want to create ints.

And finally .astype(int) to turn each string into an integer

110

answered Oct 20 '22 05:10

yatu

There are already answers to this question but , i would like to add a another solution:

DataFrame:

>>> df
   col1 col2    col3
0     1    x  12,123
1     2    x   1,123
2     3    y  45,998

Try simplest by using str.replace method and you are all done:

>>> df['col3'] = df['col3'].str.replace(",", "")
# df['col3'] = df['col3'].str.replace(",", "").astype(int) <- cast to int
>>> df
   col1 col2   col3
0     1    x  12123
1     2    x   1123
2     3    y  45998

another using df.replace along with regex method as Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.

>>> df['col3'] = df['col3'].replace(',', '', regex=True)
>>> df
   col1 col2   col3
0     1    x  12123
1     2    x   1123
2     3    y  45998

answered Oct 20 '22 05:10

Karn Kumar

Brief explanation:

df['col3'].str.strip(',').str.join('').astype(int)

df['col3'] generates a pandas.Series from the values of col3
_______.str can be understood as a cast-to-string, usually means you would like to use a string method to the contents of your series
_____.str.strip(',') uses the strip method: break a string into substrings, using the separator provided as the parameter used to distinguish when one substring ends and when the next one begins
_____.str.strip(',').str.join('') takes the substrings generated by the split and concatenates them together (effectively you're just removing the separator)
____.astype(int) casts your result to an int

Credit to nixon on including the join to generate the actual desired output. Hope this helps, happy coding!

answered Oct 20 '22 06:10

Yuca

Related questions
                            
                                Testing with Django: stuck at test database creation
                            
                                bad interpreter no such file or directory /usr/bin/python
                            
                                Pandas Apply Function That returns two new columns
                            
                                How to initialize a list and extend it with another list in one line? [duplicate]
                            
                                What is the standard way to recommend "Python 3 only" compatibility for a Python module?
                            
                                Flask caching multiple files in project
                            
                                Flash messaging not working in Flask
                            
                                Moving data from sqlalchemy to a pandas DataFrame
                            
                                Replace empty values of a dictionary with NaN
                            
                                Create a DataFrame of combinations for each group with pandas
                            
                                Validation for datefield so it doesn't take future dates in django?
                            
                                Scale a numpy array with from -0.1 - 0.2 to 0-255
                            
                                Access Google Photo API with Python using google-api-python-client
                            
                                RuntimeError: The session is unavailable because no secret key was set. Set the secret_key on the application to something unique and secret
                            
                                Is connection pool in sqlalchemy thread-safe?
                            
                                Python Selenium --user-data-dir option ERROR: could not remove old devtools port file
                            
                                Altair: Can't facet layered plots
                            
                                asyncio: collecting results from an async function in an executor
                            
                                Most efficient way to use a large data set for PyTorch?
                            
                                TypeError: Required argument 'mat' (pos 2) not found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Converting pandas column of comma-separated strings into integers

Tags:

python

python-3.x

pandas