Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert String With Comma To Number Using Python Pandas

I am generating a pivot table report using the pandas Python module. The source data includes a lot of readings measured in milliseconds. If the number of milliseconds exceeds 999 then the value in that CSV file will include commas (e.g. 1,234 = 1.234 seconds).

Here's how I'm trying to run the report:

import pandas as pd
import numpy as np

pool_usage = pd.read_csv("c:/foo/ds-dump.csv")

# Add a column to the end that shows you where the data came from
pool_usage["Source File"] = "ds-dump.csv"

report = pool_usage.pivot_table(values=['Average Pool Size', 'Average Usage Time (ms)'], index=['Source File'], aggfunc=np.max)

print(report)

The problem is that the dtype for the Average Usage Time (ms) is an object so the np.max function just treats it like it's NaN. I therefore never see any values greater than 999.

I tried fixing the issue like this:

import pandas as pd
import numpy as np

pool_usage = pd.read_csv("c:/foo/ds-dump.csv")

# Add a column to the end that shows you where the data came from
pool_usage["Source File"] = "ds-dump.csv"

# Convert strings to numbers if possible
pool_usage = pool_usage.convert_objects(convert_numeric=True)

report = pool_usage.pivot_table(values=['Average Pool Size', 'Average Usage Time (ms)'], index=['Source File'], aggfunc=np.max)

print(report)

This did actually change the dtype of the Average Usage Time column to a float but all of the values that are greater than 999 are still treated like NaN's.

How can I convert the Average Usage Time column to a float even though it's possible that some of the values may include commas?

like image 331
Tom Purl Avatar asked Jan 09 '23 04:01

Tom Purl


1 Answers

The read_csv function takes an optional thousands argument. Its default is None so you can change it to "," to have it recognise 1,234 as 1234 when it reads the file:

pd.read_csv("c:/foo/ds-dump.csv", thousands=",")

The column holding the millisecond values should then have the int64 datatype once the file has been read into memory.

like image 144
Alex Riley Avatar answered Jan 14 '23 00:01

Alex Riley