I'm learning to use generators but don't quite understand how they work.
What I want to do is iterate over rows and multiply a cell by another cell in each row, then create a new column with the results.
rate = (df['Fee'][i] for df['Fee'] in df / df['Costs'][i] for df['Costs'] in df * 100)
df['rate']=df.iterrows(rate)
So above, I've tried to make a generator which calculates what the percentage the fee is from the costs.
I realise this would be much easier with a for loop but I wanted to learn how a generator would be used in this instance.
Example dataframe below.
Industry Expr1 Fee Costs
Food & Drink June 9970.320 116171.15
Music Industry June 7255.534 131492.59
Manufacturing June 5278.960 171315.01
Music Industry June 6120.596 143688.78
Telecommunications April 4123.986 78733.09
The succinct answer is "You don't". Or as the Pandas documentation puts it:
When doing data analysis, as with raw NumPy arrays looping through Series value-by-value is usually not necessary. Series can also be passed into most NumPy methods expecting an ndarray.
This also applies to DataFrames and many other structures that leverage ndarray
. For more insight I would really recommend learning more about how pandas/NumPy/SciPy work internally.
Regarding this particular topic I would point you to Pandas - Intro to Data Structures - Data Alignment and Arithmetic and NumPy - Broadcasting
Behind the scenes these packages use a lot of C code to optimize operations. While generators/iterators are great they will never be able to match such optimized code. For example, given your problem example here is a simple test.
np.all((df.Fee / df.Costs).values == np.array([x / y for x, y in df[['Fee', 'Costs']].values]))
True
%timeit (df.Fee / df.Costs).values
78.5 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.array([x / y for x, y in df[['Fee', 'Costs']].values])
331 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
As you can see the built in method of division used internally by Pandas is ~ 5x faster. And that is on a terribly small sample size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With