Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change pandas data frame column values inplace

Tags:

python

pandas

pan

I have a pandas data frame.

keyword                     adGroup     goal6Value   adCost
aaaa                        (not set)   0            0.0
+bbbb                       (not set)   0            0.0
+cccc                       (not set)   2072         0.0
dddd                        (not set)   0            0.0

I changed the values in the first column, to add brackets to the keywords based on some conditions (if there's no "+" symbol, add brackets).

keyword                     adGroup     goal6Value   adCost
[aaaa]                      (not set)   0            0.0
+bbbb                       (not set)   0            0.0
+cccc                       (not set)   2072         0.0
[dddd]                      (not set)   0            0.0

This is the function created to add bracket:

def add_bracket(df):

    df["keyword"] = df["keyword"].astype('str')
    keyword_list = list()

    for index, row in df.iterrows():
       keyword = row["keyword"]
       if keyword.find("+") < 0:
         keyword = "[" + keyword + "]"
       keyword_list.append(keyword)

    kw = pd.DataFrame(keyword_list, columns = ['Keyword2'])
    df2 = pd.concat([df, kw], axis=1).drop(columns["keyword"]).rename(columns={'Keyword2': 'keyword'})
    df2 = df2[['keyword', 'adGroup', 'goal6Value', 'adCost']]
    return df2

The function produced the result I want, but is there a neater way in pandas so that I don't need to create df2 to add the output of column 1 (basically doing the changes inplace)?

Solution: Based on @Inder's suggested answer, this whole function can be written in one line.

df["keyword"] = df.keyword.apply(lambda x: "[" + x + "]" if x.find("+") < 0 else x)

Based on @RafaelC's answer.

mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"
like image 301
azmirfakkri Avatar asked Aug 05 '18 20:08

azmirfakkri


People also ask

How do I change the values in a pandas DataFrame column?

In order to replace a value in Pandas DataFrame, use the replace() method with the column the from and to values. Below example replace Spark with PySpark value on the Course column. Notice that all the Spark values are replaced with the Pyspark values under the first column.

How do you replace a specific value in a data frame?

Pandas DataFrame replace() Method The replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.

How do you interchange columns in a data frame?

Swapping two dataframe columns is like interchanging the values of two columns. Pandas provides us a special feature or method called DataFrame. reindex() which is used for swapping two columns at a time, it takes a list of columns that needs to be swapped inside it as a parameter.


1 Answers

Just sum

mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"

    keyword 
0   [aaaa]  
1   [bbbb]  
2   [cccc]  
3   [dddd]  

Why is this better than apply?

Take a look at the timings :

%timeit "[" + df.loc[mask, 'keyword'] + "]"
348 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.keyword.apply(lambda x:[x])
112 µs ± 3.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Whoa, so apply is faster?

Not quite. Maybe in a very very small df, but take a look at the same operation on a bigger df with 100,000 times more rows :

df = pd.concat([df]*100000)

%timeit "[" + df.loc[mask, 'keyword'] + "]"
4.54 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.keyword.apply(lambda x:[x])
129 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So apply gets very very slow very fast, but vectorized operations don't

like image 97
rafaelc Avatar answered Sep 22 '22 06:09

rafaelc