First I have the following empty DataFrame preallocated: <pre class="prettyprint"><code>df=DataFrame(columns=range(10000),index=range(1000)) </code></pre> Then I want to update the <code>df</code> row by row (efficiently) with a length-10000 numpy array as data. My problem is: I don't even have an idea what method of DataFrame I should use to accomplish this task. Thank you!

<pre class="prettyprint"><code>df=DataFrame(columns=range(10),index=range(10)) a = np.array( [9,9,9,9,9,9,9,9,9,9] ) </code></pre> Update row: <pre class="prettyprint"><code>df.loc[2] = a </code></pre> Using Jeff's idea... <pre class="prettyprint"><code>df2 = DataFrame(data=np.random.randn(10,10), index=arange(10)) df2.head().T </code></pre> I have written up a notebook answering the question: https://www.wakari.io/sharing/bundle/hrojas/pandas%20efficient%20dataframe%20set%20row

pandas efficient dataframe set row

First I have the following empty DataFrame preallocated:

df=DataFrame(columns=range(10000),index=range(1000))

Then I want to update the df row by row (efficiently) with a length-10000 numpy array as data. My problem is: I don't even have an idea what method of DataFrame I should use to accomplish this task.

Thank you!

Is pandas apply faster than Iterrows?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.

How do I iterate over a row in pandas?

DataFrame. iterrows() method is used to iterate over DataFrame rows as (index, Series) pairs. Note that this method does not preserve the dtypes across rows due to the fact that this method will convert each row into a Series .

Here's 3 methods, only 100 columns, 1000 rows

In [5]: row = np.random.randn(100)

Row wise assignment

In [6]: def method1():
   ...:     df = DataFrame(columns=range(100),index=range(1000))
   ...:     for i in xrange(len(df)):
   ...:         df.iloc[i] = row
   ...:     return df
   ...:

Build up the arrays in a list, create the frame all at once

In [9]: def method2():
   ...:     return DataFrame([ row for i in range(1000) ])
   ...:

Columnwise assignment (with transposes at both ends)

In [13]: def method3():
   ....:     df = DataFrame(columns=range(100),index=range(1000)).T
   ....:     for i in xrange(1000):
   ....:         df[i] = row
   ....:     return df.T
   ....:

These all have the same output frame

In [22]: (method2() == method1()).all().all()
Out[22]: True

In [23]: (method2() == method3()).all().all()
Out[23]: True


In [8]: %timeit method1()
1 loops, best of 3: 1.76 s per loop

In [10]: %timeit method2()
1000 loops, best of 3: 7.79 ms per loop

In [14]: %timeit method3()
1 loops, best of 3: 1.33 s per loop

It is CLEAR that building up a list, THEN creating the frame all at once is orders of magnitude faster than doing any form of assignment. Assignment involves copying. Building up all at once only copies once.

df=DataFrame(columns=range(10),index=range(10))
a = np.array( [9,9,9,9,9,9,9,9,9,9] )

Update row:

df.loc[2] = a

Using Jeff's idea...

df2 = DataFrame(data=np.random.randn(10,10), index=arange(10))
df2.head().T

I have written up a notebook answering the question: https://www.wakari.io/sharing/bundle/hrojas/pandas%20efficient%20dataframe%20set%20row

pandas efficient dataframe set row

Tags:

wdg

People also ask

2 Answers

Jeff

DataByDavid

Recent Activity

Donate For Us

pandas efficient dataframe set row

Tags:

wdg

People also ask

2 Answers

Jeff

DataByDavid

Related questions

Recent Activity

Donate For Us