Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a time column to a DataFrame based on another DataFrame

I have DataFrame DataA, whose rows represent the value of an item

DataA
row  item_id  value
0    x        V1
1    y        V2
2    z        V3
3    y        V4
4    z        V5
5    x        V6
6    y        V7
7    z        V8
8    z        V9

Furthermore there is another DataFrame called DataA_mapper that maps a time value to a sequence of rows in DataA

DataA_mapper
time  start_row  num_rows
0     0          3
1     3          2
3     5          2
5     8          1

For a given row in DataA_mapper the rows in DataA for the range [start_row, start_row + num_row) will all be given DataA_mapper.time.

DF definition in python:

dataA = [
    [x, 'V1'], [y, 'V2'], [z, 'V3'], [y, 'V4'],
    [z, 'V5'], [x, 'V6'], [y, 'V7'], [z, 'V8'], [z, 'V9']]


DataA_mapper = [[0, 0, 3], [1, 3, 2], [3, 5, 2], [5, 8, 1]]


dataA_df = pd.DataFrame(dataA, columns = ['item_id', 'value'])
DataA_mapper_df = pd.DataFrame(DataA_mapper, columns = ['time', 'start_row', 'num_rows'])

I would like to generate the following DataFrame, however I'm not sure where to begin:

time  item_id   value
0     x         V1
0     y         V2
0     z         V3
1     y         V4
1     z         V5
3     x         V6
3     y         V7
5     z         V9
like image 708
Lucinda Rigetti Avatar asked Jan 26 '26 05:01

Lucinda Rigetti


1 Answers

I think you need Series.repeat.

dataA_df.index = DataA_mapper_df.time.repeat(DataA_mapper_df.num_rows)
dataA_df = dataA_df.reset_index()
print(dataA_df)

Output

   time item_id value
0     0       x    V1
1     0       y    V2
2     0       z    V3
3     1       y    V4
4     1       z    V5
5     3       x    V6
6     3       y    V7
7     3       z    V8
8     5       z    V9
like image 60
ansev Avatar answered Jan 27 '26 18:01

ansev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!