divide group data base on select columns values?

Tags:

pandas

   ts_code    type  close

  0 861001.TI   1   648.399
  1 861001.TI   20  588.574
  2 861001.TI   30  621.926
  3 861001.TI   60  760.623
  4 861001.TI   90  682.313
  ...   ... ... ...
  8328  885933.TI   5   1083.141
  8329  885934.TI   1   951.493
  8330  885934.TI   5   1011.346
  8331  885935.TI   1   1086.558
  8332  885935.TI   5   1028.449

Goal

ts_code    l5d_close l20d_close …… l90d_close
861001.TI   NaN       1.10          0.95
……           ……       ……            ……

I want to groupby ts_code to calculate the close of type(1)/the close of type(N:5,20,30……). Take 861001.TI for example, l5d_close is nan because there is no value when the type is 5. l20d_close equals 648.399/588.574=1.10, l90d_close equals 648.399/682.313=0.95. And the result is rounded.

Try

df.groupby('ts_code')\
  .pipe(lambda x: x[x.type==1].close/x[x.type==10].close)

Got: KeyError: 'Column not found: False'

The type values is: 1,5,20,30,60,90,180,200

Notice: There is one value of type columns for each ts_code

776

asked Sep 21 '21 02:09

Jack

Video Answer

3 Answers

Use sort_values to make sure type == 1 is the first row per group and extract them with groupby.transform('first'):

df = df.sort_values(['ts_code', 'type'])
close1 = df.groupby('ts_code')['close'].transform('first')
df['close'] = close1 / df['close']

#         ts_code  type     close
# 0     861001.TI     1  1.000000
# 1     861001.TI    20  1.101644
# 2     861001.TI    30  1.042566
# 3     861001.TI    60  0.852458
# ...         ...   ...       ...

Then pivot the type column into column headers:

out = (df.pivot(index='ts_code', columns='type', values='close')
         .drop(columns=1)
         .add_prefix('l')
         .add_suffix('d_close'))

# type       l5d_close  l20d_close  l30d_close  l60d_close  l90d_close
# ts_code
# 861001.TI        NaN    1.101644    1.042566    0.852458    0.950296
# ...              ...         ...         ...         ...         ...

To chain together, assign a ratio column before the pivot:

(df.assign(ratio=df.groupby('ts_code').close.transform('first').div(df.close))
   .pivot(index='ts_code', columns='type', values='ratio')
   .drop(columns=1)
   .add_prefix('l')
   .add_suffix('d_close'))

# type       l5d_close  l20d_close  l30d_close  l60d_close  l90d_close
# ts_code
# 861001.TI        NaN    1.101644    1.042566    0.852458    0.950296
# ...              ...         ...         ...         ...         ...

147

answered Oct 18 '22 04:10

tdy

well, I don't think you need group by here as you aren't really grouping or using any aggregative function.
I think it will be easier to just create a new DF using a function

ts_codes = df.ts_code.unique()
types = [5,20,30,60,90,180,200]
ts_results = []
for ts_code in ts_codes:
    ts_result = [ts_code]
    temp = df.loc[df.tscode == ts_code]
    val_1 = temp.loc[df.type == 1]['close'].iloc[0] # to get the actual value 
    for type in types:
        val = temp.loc[df.type == type]
        if len(val) > 0:
            ts_result.append(val_1/val)
        else:
            ts_result.append(None)
    ts_results.append(ts_result)
results_df = pd.DataFrame(ts_results,coluns=['ts_code','l5d_close', 'l20d_close' …… 'l90d_close']

I didn't run the code to get the results as you didn't provide an easy way to generate your data hope this help

and sorry but sometimes the easy solution is the best (I would have move it to a function)

answered Oct 18 '22 04:10

gal peled

You can do this with pandas.DataFrame.pivot_table() (docs). As long as there's some data of each type, that column will be created.

pivoted = (
    df.pivot_table(values=["close"], index="ts_code", columns="type")
    # get rid of the first MultiIndex level
    .droplevel(0, axis=1)
    # divide type == 1 column values by every other column
    .pipe(lambda f: f[[1]].values / f.iloc[:, 1:])
    .round(2)
)

# format column names
pivoted.columns = "l" + pivoted.columns.astype(str) + "d_close"
pivoted

This returns:

type       l5d_close  l20d_close  l30d_close  l60d_close  l90d_close
ts_code                                                             
861001.TI        NaN    1.101644    1.042566    0.852458    0.950296
885933.TI        NaN         NaN         NaN         NaN         NaN
885934.TI   0.940818         NaN         NaN         NaN         NaN
885935.TI   1.056502         NaN         NaN         NaN         NaN

answered Oct 18 '22 03:10

onepan

Related questions
                            
                                Scraping data from a dynamic web table
                            
                                str.encode() giving unexpected results
                            
                                How to fill the values in the list and convert it into the dataframe?
                            
                                Making a ML model scikit-learn compatible
                            
                                InvalidArgumentError: required broadcastable shapes at loc(unknown)
                            
                                Forward fill only certain value
                            
                                How to get the target by adding using python
                            
                                VS Code portable on Linux is still using for packages local user folder instead of the enviroment folder, and because of that imports fail
                            
                                What is the Sobel operator?
                            
                                In Pandas with Groupby: assign a value from a column conditioned on another column
                            
                                Drop all rows that have all NA values after last row that is not NA
                            
                                Building ML classifier with imbalanced data
                            
                                yfinance not working - receiving json.decoder.JSONDecodeError
                            
                                Django admin, page not found in custom view
                            
                                AttributeError: dlsym(RTLD_DEFAULT, AttachDebuggerTracing): symbol not found
                            
                                Using decorators of optional dependency
                            
                                Can anyone please explain why set is behaving like this with boolean in it? [duplicate]
                            
                                How to parse datetime that is coming in Arabic text (٠٤-٢٥-٢٠٢١) to English dates in Pyspark
                            
                                Split a string in pandas row and insert new rows by enlarging the dataframe
                            
                                Pandas counting the number of group elements excluding the focal element

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With