I have a following data frame df with two columns "identifier", "values" and "subid": <pre class="prettyprint"><code> identifier values subid 0 1 101 1 1 1 102 1 2 1 103 2 #index in list x 3 1 104 2 4 1 105 2 5 2 106 3 6 2 107 3 7 2 108 3 8 2 109 4 #index in list x 9 2 110 4 10 3 111 5 11 3 112 5 12 3 113 6 #index in list x </code></pre> I have a list of indices, say <pre class="prettyprint"><code>x = [2, 8, 12] </code></pre> I want insert rows just before the indices mentioned in the list x. Like, for the row which is inserted just before index 2, will have the following values, it will have the same identifier as the row at index 2, i.e. 1; same values as the row at index 2, i.e. 103; but the subid in the new row would be ((subid at index 2) - 1), or simply the subid from the previous row i.e 1. Below is the final resultant df I expect: <pre class="prettyprint"><code> identifier values subid 0 1 101 1 1 1 102 1 2 1 103 1 #new row inserted 3 1 103 2 #index in list x 4 1 104 2 5 1 105 2 6 2 106 3 7 2 107 3 8 2 108 3 9 2 109 3 #new row inserted 10 2 109 4 #index in list x 11 2 110 4 12 3 111 5 13 3 112 5 14 3 113 5 #new row inserted 15 3 113 6 #index in list x </code></pre> The code I have been trying: <pre class="prettyprint"><code> m = df.index #storing the indices of the df #m for i in m: if i in x: #x is the given list of indices df.iloc[i-1]["identifier"] = df.iloc[i]["identifier"] df.iloc[i-1]["values"] = df.iloc[i]["values"] df.iloc[i-1]["subid"] = (df.iloc[i]["subid"]-1) df </code></pre> The above code is simply replacing the rows at (i-1) indices and not inserting the additional rows with the above values. Please help. Please let me know if anything is unclear.

Preserving the index order is the tricky part. I'm not sure this is the most efficient way to do this, but it should work. <pre class="prettyprint"><code>x = [2,8,12] rows = [] cur = {} for i in df.index: if i in x: cur['index'] = i cur['identifier'] = df.iloc[i].identifier cur['values'] = df.iloc[i]['values'] cur['subid'] = df.iloc[i].subid - 1 rows.append(cur) cur = {} </code></pre> Then, iterate through the new rows list, and perform an incremental concat, inserting each new row into the correct spot. <pre class="prettyprint"><code>offset = 0; #tracks the number of rows already inserted to ensure rows are inserted in the correct position for d in rows: df = pd.concat([df.head(d['index'] + offset), pd.DataFrame([d]), df.tail(len(df) - (d['index']+offset))]) offset+=1 df.reset_index(inplace=True) df.drop('index', axis=1, inplace=True) df level_0 identifier subid values 0 0 1 1 101 1 1 1 1 102 2 0 1 1 103 3 2 1 2 103 4 3 1 2 104 5 4 1 2 105 6 5 2 3 106 7 6 2 3 107 8 7 2 3 108 9 0 2 3 109 10 8 2 4 109 11 9 2 4 110 12 10 3 5 111 13 11 3 5 112 14 0 3 5 113 15 12 3 6 113 </code></pre>

subtract where the prior row is different than the current row <pre class="prettyprint"><code># edit in place df['values'] -= df.identifier.ne(df.identifier.shift().bfill()) df identifier values 0 1 101 1 1 102 2 1 103 3 1 104 4 1 105 5 2 105 6 2 107 7 2 108 8 2 109 9 2 110 10 3 110 11 3 112 12 3 113 </code></pre> Or <pre class="prettyprint"><code># new dataframe df.assign(values=df['values'] - df.identifier.ne(df.identifier.shift().bfill())) identifier values 0 1 101 1 1 102 2 1 103 3 1 104 4 1 105 5 2 105 6 2 107 7 2 108 8 2 109 9 2 110 10 3 110 11 3 112 12 3 113 </code></pre>

Inserting new rows in pandas data frame at specific indices

Tags:

python

pandas

dataframe

I have a following data frame df with two columns "identifier", "values" and "subid":

     identifier   values    subid
0      1          101       1
1      1          102       1
2      1          103       2 #index in list x        
3      1          104       2
4      1          105       2
5      2          106       3   
6      2          107       3
7      2          108       3
8      2          109       4 #index in list x
9      2          110       4
10     3          111       5
11     3          112       5 
12     3          113       6 #index in list x

I have a list of indices, say

x = [2, 8, 12]

I want insert rows just before the indices mentioned in the list x. Like, for the row which is inserted just before index 2, will have the following values, it will have the same identifier as the row at index 2, i.e. 1; same values as the row at index 2, i.e. 103; but the subid in the new row would be ((subid at index 2) - 1), or simply the subid from the previous row i.e 1.

Below is the final resultant df I expect:

   identifier   values    subid
0      1          101       1
1      1          102       1
2      1          103       1 #new row inserted     
3      1          103       2 #index in list x        
4      1          104       2
5      1          105       2
6      2          106       3   
7      2          107       3
8      2          108       3
9      2          109       3 #new row inserted
10     2          109       4 #index in list x
11     2          110       4
12     3          111       5
13     3          112       5 
14     3          113       5 #new row inserted
15     3          113       6 #index in list x

The code I have been trying:

 m = df.index       #storing the indices of the df
 #m

 for i in m:
     if i in x:     #x is the given list of indices
         df.iloc[i-1]["identifier"] = df.iloc[i]["identifier"]
         df.iloc[i-1]["values"] = df.iloc[i]["values"]
         df.iloc[i-1]["subid"] = (df.iloc[i]["subid"]-1)
 df

The above code is simply replacing the rows at (i-1) indices and not inserting the additional rows with the above values. Please help.

Please let me know if anything is unclear.

837

asked Jun 17 '17 00:06

Liza

2 Answers

Preserving the index order is the tricky part. I'm not sure this is the most efficient way to do this, but it should work.

x = [2,8,12]
rows = []
cur = {}

for i in df.index:
    if i in x:
        cur['index'] = i
        cur['identifier'] = df.iloc[i].identifier
        cur['values'] = df.iloc[i]['values']
        cur['subid'] = df.iloc[i].subid - 1
        rows.append(cur)
        cur = {}

Then, iterate through the new rows list, and perform an incremental concat, inserting each new row into the correct spot.

offset = 0; #tracks the number of rows already inserted to ensure rows are inserted in the correct position

for d in rows:
    df = pd.concat([df.head(d['index'] + offset), pd.DataFrame([d]), df.tail(len(df) - (d['index']+offset))])
    offset+=1


df.reset_index(inplace=True)
df.drop('index', axis=1, inplace=True)
df

    level_0 identifier  subid   values
0         0          1      1      101
1         1          1      1      102
2         0          1      1      103
3         2          1      2      103
4         3          1      2      104
5         4          1      2      105
6         5          2      3      106
7         6          2      3      107
8         7          2      3      108
9         0          2      3      109
10        8          2      4      109
11        9          2      4      110
12       10          3      5      111
13       11          3      5      112
14        0          3      5      113
15       12          3      6      113

answered Oct 20 '22 20:10

bdiamante

subtract where the prior row is different than the current row

# edit in place
df['values'] -= df.identifier.ne(df.identifier.shift().bfill())
df

    identifier  values
0            1     101
1            1     102
2            1     103
3            1     104
4            1     105
5            2     105
6            2     107
7            2     108
8            2     109
9            2     110
10           3     110
11           3     112
12           3     113

# new dataframe
df.assign(values=df['values'] - df.identifier.ne(df.identifier.shift().bfill()))

    identifier  values
0            1     101
1            1     102
2            1     103
3            1     104
4            1     105
5            2     105
6            2     107
7            2     108
8            2     109
9            2     110
10           3     110
11           3     112
12           3     113

answered Oct 20 '22 21:10

piRSquared

Related questions
                            
                                What do Keras convolution layers do with color channels?
                            
                                How to properly wrap std::vector<std::size_t> with SWIG for Python? Problems with std::size_t
                            
                                Mayavi: rotate around y axis
                            
                                Python logging class in Docker: logs gone
                            
                                Calling script in standard project directory structure (Python path for bin subdirectory)
                            
                                multiprocessing global variable memory copying
                            
                                Python3: Convert Latin-1 to UTF-8 [duplicate]
                            
                                Handling french letters in Python
                            
                                How to convert arrays of x,y,z coordinates to 3D path in numpy
                            
                                How to calculate the percentage of each element in a list?
                            
                                How to use `__slots__` with initialization of attributes?
                            
                                Slow django model instance creation with Docker
                            
                                Why are modules that haven't been imported in 'sys.modules' in Python 3?
                            
                                Flask, processing requests 1 by 1
                            
                                Django channels for asynchronous periodic tasks
                            
                                Rough string alignment in python
                            
                                ipywidgets, how to change slider's value display precision
                            
                                Refactor class method to property using Pycharm
                            
                                Reversing Python's re.escape
                            
                                Where does scikit-learn hold the decision labels of each leaf node in its tree structure?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With