Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Adding a row when row iteration in dataframe find letter in string of a column

I'm looking for an efficient way of adding a row in an existing dataframe when a row iteration find a specific character in a column. this new row is gonna be a copy of the currently iterating row with just a modified value.

Here is an example of what I'm looking for: For example with row iteration if object in "string" column contain a "M" create a copy of the row just after with 50 added to the value in "Value" column

What I have:

        Name               String        Value
0      name1                 EXAN        100.1
1      name2                EXAN_        200.2
2      name3               EXAMPL        300.3 
3      name4              EXAMPLE        400.4 
4      name5                 TEST        500.5 

What I'm looking for:

        Name               String        Value
0      name1                 EXAN        100.1
1      name2                EXAN_        200.2
2      name3               EXAMPL        300.3
3      name3               EXAMPL        350.3
4      name4              EXAMPLE        400.4
5      name4              EXAMPLE        450.4 
6      name5                 TEST        500.5 

I have tried :

for i, row in df.iterrows():
    if "M" in row['String']:
        df.add_row([row.Name, row.String, row.Value+50])

I get:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1843, in __getattr__
    (type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'add_row'

Here some more informations:

>>> df.dtypes
Name              object
String            object
Value             float64

>>> type(df)
<class 'pandas.core.frame.DataFrame'>

Any help would be greatly appreciated

like image 592
Jérémz Avatar asked Feb 17 '26 06:02

Jérémz


2 Answers

One method would be to add the new values in a column, Value2, then use lreshape to merge the Value and Value2 columns into one:

import pandas as pd
df = pd.DataFrame(
    {'Name': ['name1', 'name2', 'name3', 'name4', 'name5'],
     'String': ['EXAN', 'EXAN_', 'EXAMPL', 'EXAMPLE', 'TEST'],
     'Value': [100.1, 200.2, 300.3, 400.4, 500.5]})
df['Value2'] = np.where(df['String'].str.contains(r'M'), df['Value']+50, np.nan)
df = df.reset_index(drop=False)
df = pd.lreshape(df, {'Value': ['Value', 'Value2']})
df = df.sort_values(by='index')
df = df.drop('index', axis=1)

yields

    Name   String  Value
0  name1     EXAN  100.1
1  name2    EXAN_  200.2
2  name3   EXAMPL  300.3
5  name3   EXAMPL  350.3
3  name4  EXAMPLE  400.4
6  name4  EXAMPLE  450.4
4  name5     TEST  500.5
like image 152
unutbu Avatar answered Feb 18 '26 18:02

unutbu


EDIT: It turns out, this can be done with dataframes directly (though not in place) and unutbu's method is much, much faster than iterating through. I'll leave this answer here in case you're interested in how to do this with the same sort of iteration you were planning on using, just using lists instead of inserting in place, but note that unutbu's version appears to be around 100 times faster:

df = pd.DataFrame( {'Name': [1,2,3], 'String': ['M','N','M'], 'Value': [4,5,6]} )
l = []
for _, row in df.iterrows():
    l.append([row.Name, row.String, row.Value+50])
    if "M" in row['String']:
        l.append([row.Name, row.String, row.Value+50])
df = pd.DataFrame( l, columns=['Name','String','Value'])
df
like image 36
cge Avatar answered Feb 18 '26 18:02

cge