df
ts_code type close
0 861001.TI 1 648.399
1 861001.TI 20 588.574
2 861001.TI 30 621.926
3 861001.TI 60 760.623
4 861001.TI 90 682.313
... ... ... ...
8328 885933.TI 5 1083.141
8329 885934.TI 1 951.493
8330 885934.TI 5 1011.346
8331 885935.TI 1 1086.558
8332 885935.TI 5 1028.449
Goal
ts_code l5d_close l20d_close …… l90d_close
861001.TI NaN 1.10 0.95
…… …… …… ……
I want to groupby ts_code
to calculate the close
of type(1)
/the close
of type(N:5,20,30……)
. Take 861001.TI
for example, l5d_close
is nan because there is no value when the type is 5. l20d_close
equals 648.399/588.574=1.10, l90d_close
equals 648.399/682.313=0.95. And the result is rounded.
Try
df.groupby('ts_code')\
.pipe(lambda x: x[x.type==1].close/x[x.type==10].close)
Got: KeyError: 'Column not found: False'
The type values is: 1,5,20,30,60,90,180,200
Notice: There is one value of type
columns for each ts_code
Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.
You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.
Method 2: Using Dataframe.groupby (). This method is used to split the data into groups based on some criteria. In the above example, the data frame ‘df’ is split into 2 parts ‘df1’ and ‘df2’ on the basis of values of column ‘ Salary ‘. Attention geek!
The integer divisions may behave differently depending on your choice of SQL database management system. MySQL and Oracle users can expect to see the real number to be shown with the data type being float, e.g., 3 / 2 = 1.5. However, for SQL Server and PostgreSQL users, the integer division is more complex.
This row number is assigned a "group" (the grp column) on a round-robin basis. First row is group 1, second row is group 2, then 3, the fourth gets group 0, and so on.
The subset of categorical columns looks correct, as it contains all the columns with text values in our original DataFrame. However, the subset of numerical columns seems to be missing a few columns.
Use sort_values
to make sure type == 1
is the first row per group and extract them with groupby.transform('first')
:
df = df.sort_values(['ts_code', 'type'])
close1 = df.groupby('ts_code')['close'].transform('first')
df['close'] = close1 / df['close']
# ts_code type close
# 0 861001.TI 1 1.000000
# 1 861001.TI 20 1.101644
# 2 861001.TI 30 1.042566
# 3 861001.TI 60 0.852458
# ... ... ... ...
Then pivot
the type
column into column headers:
out = (df.pivot(index='ts_code', columns='type', values='close')
.drop(columns=1)
.add_prefix('l')
.add_suffix('d_close'))
# type l5d_close l20d_close l30d_close l60d_close l90d_close
# ts_code
# 861001.TI NaN 1.101644 1.042566 0.852458 0.950296
# ... ... ... ... ... ...
To chain together, assign
a ratio
column before the pivot
:
(df.assign(ratio=df.groupby('ts_code').close.transform('first').div(df.close))
.pivot(index='ts_code', columns='type', values='ratio')
.drop(columns=1)
.add_prefix('l')
.add_suffix('d_close'))
# type l5d_close l20d_close l30d_close l60d_close l90d_close
# ts_code
# 861001.TI NaN 1.101644 1.042566 0.852458 0.950296
# ... ... ... ... ... ...
well, I don't think you need group by here as you aren't really grouping or using any aggregative function.
I think it will be easier to just create a new DF using a function
ts_codes = df.ts_code.unique()
types = [5,20,30,60,90,180,200]
ts_results = []
for ts_code in ts_codes:
ts_result = [ts_code]
temp = df.loc[df.tscode == ts_code]
val_1 = temp.loc[df.type == 1]['close'].iloc[0] # to get the actual value
for type in types:
val = temp.loc[df.type == type]
if len(val) > 0:
ts_result.append(val_1/val)
else:
ts_result.append(None)
ts_results.append(ts_result)
results_df = pd.DataFrame(ts_results,coluns=['ts_code','l5d_close', 'l20d_close' …… 'l90d_close']
I didn't run the code to get the results as you didn't provide an easy way to generate your data hope this help
and sorry but sometimes the easy solution is the best (I would have move it to a function)
You can do this with pandas.DataFrame.pivot_table()
(docs). As long as there's some data of each type
, that column will be created.
pivoted = (
df.pivot_table(values=["close"], index="ts_code", columns="type")
# get rid of the first MultiIndex level
.droplevel(0, axis=1)
# divide type == 1 column values by every other column
.pipe(lambda f: f[[1]].values / f.iloc[:, 1:])
.round(2)
)
# format column names
pivoted.columns = "l" + pivoted.columns.astype(str) + "d_close"
pivoted
This returns:
type l5d_close l20d_close l30d_close l60d_close l90d_close
ts_code
861001.TI NaN 1.101644 1.042566 0.852458 0.950296
885933.TI NaN NaN NaN NaN NaN
885934.TI 0.940818 NaN NaN NaN NaN
885935.TI 1.056502 NaN NaN NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With