I'm looking for an efficient way of adding a row in an existing dataframe when a row iteration find a specific character in a column. this new row is gonna be a copy of the currently iterating row with just a modified value.
Here is an example of what I'm looking for: For example with row iteration if object in "string" column contain a "M" create a copy of the row just after with 50 added to the value in "Value" column
What I have:
Name String Value
0 name1 EXAN 100.1
1 name2 EXAN_ 200.2
2 name3 EXAMPL 300.3
3 name4 EXAMPLE 400.4
4 name5 TEST 500.5
What I'm looking for:
Name String Value
0 name1 EXAN 100.1
1 name2 EXAN_ 200.2
2 name3 EXAMPL 300.3
3 name3 EXAMPL 350.3
4 name4 EXAMPLE 400.4
5 name4 EXAMPLE 450.4
6 name5 TEST 500.5
I have tried :
for i, row in df.iterrows():
if "M" in row['String']:
df.add_row([row.Name, row.String, row.Value+50])
I get:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1843, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'add_row'
Here some more informations:
>>> df.dtypes
Name object
String object
Value float64
>>> type(df)
<class 'pandas.core.frame.DataFrame'>
Any help would be greatly appreciated
One method would be to add the new values in a column, Value2, then use
lreshape to merge the Value and Value2 columns into one:
import pandas as pd
df = pd.DataFrame(
{'Name': ['name1', 'name2', 'name3', 'name4', 'name5'],
'String': ['EXAN', 'EXAN_', 'EXAMPL', 'EXAMPLE', 'TEST'],
'Value': [100.1, 200.2, 300.3, 400.4, 500.5]})
df['Value2'] = np.where(df['String'].str.contains(r'M'), df['Value']+50, np.nan)
df = df.reset_index(drop=False)
df = pd.lreshape(df, {'Value': ['Value', 'Value2']})
df = df.sort_values(by='index')
df = df.drop('index', axis=1)
yields
Name String Value
0 name1 EXAN 100.1
1 name2 EXAN_ 200.2
2 name3 EXAMPL 300.3
5 name3 EXAMPL 350.3
3 name4 EXAMPLE 400.4
6 name4 EXAMPLE 450.4
4 name5 TEST 500.5
EDIT: It turns out, this can be done with dataframes directly (though not in place) and unutbu's method is much, much faster than iterating through. I'll leave this answer here in case you're interested in how to do this with the same sort of iteration you were planning on using, just using lists instead of inserting in place, but note that unutbu's version appears to be around 100 times faster:
df = pd.DataFrame( {'Name': [1,2,3], 'String': ['M','N','M'], 'Value': [4,5,6]} )
l = []
for _, row in df.iterrows():
l.append([row.Name, row.String, row.Value+50])
if "M" in row['String']:
l.append([row.Name, row.String, row.Value+50])
df = pd.DataFrame( l, columns=['Name','String','Value'])
df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With