I do most of my data work in SAS but need to use python for a particular project (I'm not very competent in python). I have a dataframe like this:
values = ['a_us', 'b_us', 'c_us', 'a_ww','b_ww','c_ww']
df = pd.DataFrame(np.random.rand(1, 6), columns=values[:6])
One thing I need to do is calculate the ratio of US to WW for each of companies a, b and c. I know how to do it the long way in python-- I'd just do this for each company:
df['*company*_ratio'] = df['*company*_us']/df['*company*_ww']
But, how would can I do this without having to write out each equation? I am thinking I could do something like
for x in [a,b,c]:
or I could define a function. However, I don't know enough to implement either of those options or even what to search to find an answer (as I'm sure it's been asked before). In SAS I would just write a macro that fills in company.
Thanks.
You can first find unique
values by first char of columns by indexing with str:
print df.columns.str[0].unique()
['a' 'b' 'c']
Or by first substring if columns are splited
by _
(better for real data).
print df.columns.str.split('_').str[0].unique()
['a' 'b' 'c']
for x in df.columns.str[0].unique():
df[x + '_ratio'] = df[x + '_us']/df[x + '_ww']
Comparing:
import pandas as pd
import numpy as np
np.random.seed(0)
values = ['a_us', 'b_us', 'c_us', 'a_ww','b_ww','c_ww']
df = pd.DataFrame(np.random.rand(1, 6), columns=values[:6])
df['a_ratio'] = df['a_us']/df['a_ww']
df['b_ratio'] = df['b_us']/df['b_ww']
df['c_ratio'] = df['c_us']/df['c_ww']
print df
a_us b_us c_us a_ww b_ww c_ww a_ratio \
0 0.548814 0.715189 0.602763 0.544883 0.423655 0.645894 1.007213
b_ratio c_ratio
0 1.688142 0.933223
is same as:
import pandas as pd
import numpy as np
np.random.seed(0)
values = ['a_us', 'b_us', 'c_us', 'a_ww','b_ww','c_ww']
df = pd.DataFrame(np.random.rand(1, 6), columns=values[:6])
for x in df.columns.str[0].unique():
df[x + '_ratio'] = df[x+'_us']/df[x+'_ww']
print df
a_us b_us c_us a_ww b_ww c_ww a_ratio \
0 0.548814 0.715189 0.602763 0.544883 0.423655 0.645894 1.007213
b_ratio c_ratio
0 1.688142 0.933223
You should use MultiIndex http://pandas.pydata.org/pandas-docs/stable/advanced.html
you should read the section, but your specific case can be:
df = pandas.DataFrame(np.random.rand(10, 6), columns=pandas.MultiIndex.from_product([['us', 'ww'], ['a', 'b', 'c']]))
ratio = df['us']/ df['ww']
the result is a data frame with 3 columns a,b,c the 3 requested ratios
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With