Changing the dtype for specific columns in a pandas dataframe

Question

I have a pandas dataframe which I have created from data stored in an xml file:

Initially the xlm file is opened and parsed

xmlData = etree.parse(filename)
trendData = xmlData.findall("//TrendData")

I created a directory which lists all the data names (which are used as column names) as keys and gives the position of the data in the xml file:

Parameters = {"TreatmentUnit":("Worklist/AdminData/AdminValues/TreatmentUnit"),
          "Modality":("Worklist/AdminData/AdminValues/Modality"),
          "Energy":("Worklist/AdminData/AdminValues/Energy"),
          "FieldSize":("Worklist/AdminData/AdminValues/Fieldsize"),
          "SDD":("Worklist/AdminData/AdminValues/SDD"),
          "Gantry":("Worklist/AdminData/AdminValues/Gantry"),
          "Wedge":("Worklist/AdminData/AdminValues/Wedge"),
          "MU":("Worklist/AdminData/AdminValues/MU"),
          "My":("Worklist/AdminData/AdminValues/My"),
          "AnalyzeParametersCAXMin":("Worklist/AdminData/AnalyzeParams/CAX/Min"),
          "AnalyzeParametersCAXMax":("Worklist/AdminData/AnalyzeParams/CAX/Max"),
          "AnalyzeParametersCAXTarget":("Worklist/AdminData/AnalyzeParams/CAX/Target"),
          "AnalyzeParametersCAXNorm":("Worklist/AdminData/AnalyzeParams/CAX/Norm"),
....}

This is just a small part of the directory, the actual one list over 80 parameters The directory keys are then sorted:

sortedKeys = list(sorted(Parameters.keys()))

A header is created for the pandas dataframe:

dateList=[]
dateList.append('date')
headers = dateList+sortedKeys

I then create an empty pandas dataframe with the same number of rows as the number of records in trendData and with the column headers set to 'headers' and then loop through the file filling the dataframe:

df = pd.DataFrame(index=np.arange(0,len(trendData)), columns=headers)
for a,b in enumerate(trendData):
    result={}
    result["date"] = dateutil.parser.parse(b.attrib['date'])
    for i,j in enumerate(Parameters):
        result[j] = b.findtext(Parameters[j])
        df.loc[a]=(result)
df = df.set_index('date')

This seems to work fine but the problem is that the dtype for each colum is set to 'object' whereas most should be integers. It's possible to use:

df.convert_objects(convert_numeric=True)

and it works fine but is now depricated. I can also use, for example, :

df.AnalyzeParametersBQFMax = pd.to_numeric(df.AnalyzeParametersBQFMax)

to convert individual columns. But is there a way of using pd.to_numeric with a list of column names. I can create a list of columns which should be integers using the following;

int64list=[]
for q in sortedKeys:
    if q.startswith("AnalyzeParameters"):
        int64list.append(q)

but cant find a way of passing this list to the function.

Neill Herbst · Accepted Answer

You can explicitly replace columns in a DataFrame with the same column just with another dtype. Try this:

import pandas as pd
data = pd.DataFrame({'date':[2000, 2001, 2002, 2003], 'type':['A', 'B', 'A', 'C']})
data['date'] = data['date'].astype('int64')

when now calling data.dtypes it should return the following:

date     int64
type    object
dtype: object

for multiple columns use a for loop to run through the int64list you mentioned in your question.

MaxU - stop WAR against UA · Answer

for multiple columns you can do it this way:

cols = df.filter(like='AnalyzeParameters').columns.tolist()
df[cols] = df[cols].astype(np.int64)

Changing the dtype for specific columns in a pandas dataframe

Tags:

python

pandas

Trigfa

2 Answers

Neill Herbst

MaxU - stop WAR against UA

Recent Activity

Donate For Us

Changing the dtype for specific columns in a pandas dataframe

Tags:

python

pandas

Trigfa

2 Answers

Neill Herbst

MaxU - stop WAR against UA

Related questions

Recent Activity

Donate For Us