I have a following data frame df with two columns "identifier", "values" and "subid":
identifier values subid
0 1 101 1
1 1 102 1
2 1 103 2 #index in list x
3 1 104 2
4 1 105 2
5 2 106 3
6 2 107 3
7 2 108 3
8 2 109 4 #index in list x
9 2 110 4
10 3 111 5
11 3 112 5
12 3 113 6 #index in list x
I have a list of indices, say
x = [2, 8, 12]
I want insert rows just before the indices mentioned in the list x. Like, for the row which is inserted just before index 2, will have the following values, it will have the same identifier as the row at index 2, i.e. 1; same values as the row at index 2, i.e. 103; but the subid in the new row would be ((subid at index 2) - 1), or simply the subid from the previous row i.e 1.
Below is the final resultant df I expect:
identifier values subid
0 1 101 1
1 1 102 1
2 1 103 1 #new row inserted
3 1 103 2 #index in list x
4 1 104 2
5 1 105 2
6 2 106 3
7 2 107 3
8 2 108 3
9 2 109 3 #new row inserted
10 2 109 4 #index in list x
11 2 110 4
12 3 111 5
13 3 112 5
14 3 113 5 #new row inserted
15 3 113 6 #index in list x
The code I have been trying:
m = df.index #storing the indices of the df
#m
for i in m:
if i in x: #x is the given list of indices
df.iloc[i-1]["identifier"] = df.iloc[i]["identifier"]
df.iloc[i-1]["values"] = df.iloc[i]["values"]
df.iloc[i-1]["subid"] = (df.iloc[i]["subid"]-1)
df
The above code is simply replacing the rows at (i-1) indices and not inserting the additional rows with the above values. Please help.
Please let me know if anything is unclear.
The easiest way to add or insert a new row into a Pandas DataFrame is to use the Pandas . append() method. The . append() method is a helper method, for the Pandas concat() function.
If you'd like to select rows based on integer indexing, you can use the . iloc function. If you'd like to select rows based on label indexing, you can use the . loc function.
You can add rows to the pandas dataframe using df. iLOC[i] = ['col-1-value', 'col-2-value', ' col-3-value '] statement.
Preserving the index order is the tricky part. I'm not sure this is the most efficient way to do this, but it should work.
x = [2,8,12]
rows = []
cur = {}
for i in df.index:
if i in x:
cur['index'] = i
cur['identifier'] = df.iloc[i].identifier
cur['values'] = df.iloc[i]['values']
cur['subid'] = df.iloc[i].subid - 1
rows.append(cur)
cur = {}
Then, iterate through the new rows list, and perform an incremental concat, inserting each new row into the correct spot.
offset = 0; #tracks the number of rows already inserted to ensure rows are inserted in the correct position
for d in rows:
df = pd.concat([df.head(d['index'] + offset), pd.DataFrame([d]), df.tail(len(df) - (d['index']+offset))])
offset+=1
df.reset_index(inplace=True)
df.drop('index', axis=1, inplace=True)
df
level_0 identifier subid values
0 0 1 1 101
1 1 1 1 102
2 0 1 1 103
3 2 1 2 103
4 3 1 2 104
5 4 1 2 105
6 5 2 3 106
7 6 2 3 107
8 7 2 3 108
9 0 2 3 109
10 8 2 4 109
11 9 2 4 110
12 10 3 5 111
13 11 3 5 112
14 0 3 5 113
15 12 3 6 113
subtract where the prior row is different than the current row
# edit in place
df['values'] -= df.identifier.ne(df.identifier.shift().bfill())
df
identifier values
0 1 101
1 1 102
2 1 103
3 1 104
4 1 105
5 2 105
6 2 107
7 2 108
8 2 109
9 2 110
10 3 110
11 3 112
12 3 113
Or
# new dataframe
df.assign(values=df['values'] - df.identifier.ne(df.identifier.shift().bfill()))
identifier values
0 1 101
1 1 102
2 1 103
3 1 104
4 1 105
5 2 105
6 2 107
7 2 108
8 2 109
9 2 110
10 3 110
11 3 112
12 3 113
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With