Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

divide group data base on select columns values?

Tags:

python

pandas

df

   ts_code    type  close

  0 861001.TI   1   648.399
  1 861001.TI   20  588.574
  2 861001.TI   30  621.926
  3 861001.TI   60  760.623
  4 861001.TI   90  682.313
  ...   ... ... ...
  8328  885933.TI   5   1083.141
  8329  885934.TI   1   951.493
  8330  885934.TI   5   1011.346
  8331  885935.TI   1   1086.558
  8332  885935.TI   5   1028.449

Goal

ts_code    l5d_close l20d_close …… l90d_close
861001.TI   NaN       1.10          0.95
……           ……       ……            ……

I want to groupby ts_code to calculate the close of type(1)/the close of type(N:5,20,30……). Take 861001.TI for example, l5d_close is nan because there is no value when the type is 5. l20d_close equals 648.399/588.574=1.10, l90d_close equals 648.399/682.313=0.95. And the result is rounded.

Try

df.groupby('ts_code')\
  .pipe(lambda x: x[x.type==1].close/x[x.type==10].close)

Got: KeyError: 'Column not found: False'

The type values is: 1,5,20,30,60,90,180,200

Notice: There is one value of type columns for each ts_code

like image 776
Jack Avatar asked Sep 21 '21 02:09

Jack


People also ask

How do you split data into a group in Python?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

How do I get Groupby columns in pandas?

You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.

How do you get Groupby rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

How do I group values in a column in pandas?

Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.

How to split data into groups based on criteria in Excel?

Method 2: Using Dataframe.groupby (). This method is used to split the data into groups based on some criteria. In the above example, the data frame ‘df’ is split into 2 parts ‘df1’ and ‘df2’ on the basis of values of column ‘ Salary ‘. Attention geek!

What is the integer division in SQL Server?

The integer divisions may behave differently depending on your choice of SQL database management system. MySQL and Oracle users can expect to see the real number to be shown with the data type being float, e.g., 3 / 2 = 1.5. However, for SQL Server and PostgreSQL users, the integer division is more complex.

What is a GRP column in Excel?

This row number is assigned a "group" (the grp column) on a round-robin basis. First row is group 1, second row is group 2, then 3, the fourth gets group 0, and so on.

What is the subset of categorical columns in a Dataframe?

The subset of categorical columns looks correct, as it contains all the columns with text values in our original DataFrame. However, the subset of numerical columns seems to be missing a few columns.


Video Answer


3 Answers

Use sort_values to make sure type == 1 is the first row per group and extract them with groupby.transform('first'):

df = df.sort_values(['ts_code', 'type'])
close1 = df.groupby('ts_code')['close'].transform('first')
df['close'] = close1 / df['close']

#         ts_code  type     close
# 0     861001.TI     1  1.000000
# 1     861001.TI    20  1.101644
# 2     861001.TI    30  1.042566
# 3     861001.TI    60  0.852458
# ...         ...   ...       ...

Then pivot the type column into column headers:

out = (df.pivot(index='ts_code', columns='type', values='close')
         .drop(columns=1)
         .add_prefix('l')
         .add_suffix('d_close'))

# type       l5d_close  l20d_close  l30d_close  l60d_close  l90d_close
# ts_code
# 861001.TI        NaN    1.101644    1.042566    0.852458    0.950296
# ...              ...         ...         ...         ...         ...

To chain together, assign a ratio column before the pivot:

(df.assign(ratio=df.groupby('ts_code').close.transform('first').div(df.close))
   .pivot(index='ts_code', columns='type', values='ratio')
   .drop(columns=1)
   .add_prefix('l')
   .add_suffix('d_close'))

# type       l5d_close  l20d_close  l30d_close  l60d_close  l90d_close
# ts_code
# 861001.TI        NaN    1.101644    1.042566    0.852458    0.950296
# ...              ...         ...         ...         ...         ...
like image 147
tdy Avatar answered Oct 18 '22 04:10

tdy


well, I don't think you need group by here as you aren't really grouping or using any aggregative function.
I think it will be easier to just create a new DF using a function

ts_codes = df.ts_code.unique()
types = [5,20,30,60,90,180,200]
ts_results = []
for ts_code in ts_codes:
    ts_result = [ts_code]
    temp = df.loc[df.tscode == ts_code]
    val_1 = temp.loc[df.type == 1]['close'].iloc[0] # to get the actual value 
    for type in types:
        val = temp.loc[df.type == type]
        if len(val) > 0:
            ts_result.append(val_1/val)
        else:
            ts_result.append(None)
    ts_results.append(ts_result)
results_df = pd.DataFrame(ts_results,coluns=['ts_code','l5d_close', 'l20d_close' …… 'l90d_close']

I didn't run the code to get the results as you didn't provide an easy way to generate your data hope this help

and sorry but sometimes the easy solution is the best (I would have move it to a function)

like image 28
gal peled Avatar answered Oct 18 '22 04:10

gal peled


You can do this with pandas.DataFrame.pivot_table() (docs). As long as there's some data of each type, that column will be created.

pivoted = (
    df.pivot_table(values=["close"], index="ts_code", columns="type")
    # get rid of the first MultiIndex level
    .droplevel(0, axis=1)
    # divide type == 1 column values by every other column
    .pipe(lambda f: f[[1]].values / f.iloc[:, 1:])
    .round(2)
)

# format column names
pivoted.columns = "l" + pivoted.columns.astype(str) + "d_close"
pivoted

This returns:

type       l5d_close  l20d_close  l30d_close  l60d_close  l90d_close
ts_code                                                             
861001.TI        NaN    1.101644    1.042566    0.852458    0.950296
885933.TI        NaN         NaN         NaN         NaN         NaN
885934.TI   0.940818         NaN         NaN         NaN         NaN
885935.TI   1.056502         NaN         NaN         NaN         NaN
like image 1
onepan Avatar answered Oct 18 '22 03:10

onepan