how to make pandas dataframe Fortran type ordered

Question

I knew a little that inside python pandas package, the dataframe has part that was constructed with NumPy NDArrays. And numpy has the option that you can choose your data order type, like 'C' or 'F'.

Since I always have to implement lots of ops on columns on huge dataframe(like 100 million lines), I expected If I have the chance to transfer dataframe from c type to f type, I could enhance the performance a lot, right?

if so, how could I do that? or simply using numpy, as pandas dataframe is not a must, a quick answer is actually.

Thanks

lxkarthi · Accepted Answer

Interestingly, Pandas uses internally C order numpy array for each column. Whenever you access multiple columns or all of dataframe, it joins those numpy arrays and returns a Fortran order numpy array.

print(df[df.columns[0]].values.flags)
print(df[df.columns[0:2]].values.flags)
print(df.values.flags)

#Single column
C_CONTIGUOUS : True
F_CONTIGUOUS : True

#Multiple columns
C_CONTIGUOUS : False
F_CONTIGUOUS : True

#Entire dataframe
C_CONTIGUOUS : False
F_CONTIGUOUS : True

So, column operations are very fast (add/edit/delete etc). That's why iterating over rows is slow in dataframe. If your program has more row operations, convert it to C order as below.

df = pd.DataFrame(np.ascontiguousarray(df.values), columns=df.columns)

Whenever I am done with processing in columns, I convert it to C contiguous array because scaling, batch training DNN is much faster in C order array.

how to make pandas dataframe Fortran type ordered

Tags:

performance

python

pandas

numpy

cinqS

1 Answers

lxkarthi

Recent Activity

Donate For Us

how to make pandas dataframe Fortran type ordered

Tags:

performance

python

pandas

numpy

cinqS

1 Answers

lxkarthi

Related questions

Recent Activity

Donate For Us