I have a pandas data frame.
keyword adGroup goal6Value adCost
aaaa (not set) 0 0.0
+bbbb (not set) 0 0.0
+cccc (not set) 2072 0.0
dddd (not set) 0 0.0
I changed the values in the first column, to add brackets to the keywords based on some conditions (if there's no "+" symbol, add brackets).
keyword adGroup goal6Value adCost
[aaaa] (not set) 0 0.0
+bbbb (not set) 0 0.0
+cccc (not set) 2072 0.0
[dddd] (not set) 0 0.0
This is the function created to add bracket:
def add_bracket(df):
df["keyword"] = df["keyword"].astype('str')
keyword_list = list()
for index, row in df.iterrows():
keyword = row["keyword"]
if keyword.find("+") < 0:
keyword = "[" + keyword + "]"
keyword_list.append(keyword)
kw = pd.DataFrame(keyword_list, columns = ['Keyword2'])
df2 = pd.concat([df, kw], axis=1).drop(columns["keyword"]).rename(columns={'Keyword2': 'keyword'})
df2 = df2[['keyword', 'adGroup', 'goal6Value', 'adCost']]
return df2
The function produced the result I want, but is there a neater way in pandas so that I don't need to create df2 to add the output of column 1 (basically doing the changes inplace)?
Solution: Based on @Inder's suggested answer, this whole function can be written in one line.
df["keyword"] = df.keyword.apply(lambda x: "[" + x + "]" if x.find("+") < 0 else x)
Based on @RafaelC's answer.
mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"
In order to replace a value in Pandas DataFrame, use the replace() method with the column the from and to values. Below example replace Spark with PySpark value on the Course column. Notice that all the Spark values are replaced with the Pyspark values under the first column.
Pandas DataFrame replace() Method The replace() method replaces the specified value with another specified value. The replace() method searches the entire DataFrame and replaces every case of the specified value.
Swapping two dataframe columns is like interchanging the values of two columns. Pandas provides us a special feature or method called DataFrame. reindex() which is used for swapping two columns at a time, it takes a list of columns that needs to be swapped inside it as a parameter.
Just sum
mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"
keyword
0 [aaaa]
1 [bbbb]
2 [cccc]
3 [dddd]
Why is this better than apply
?
Take a look at the timings :
%timeit "[" + df.loc[mask, 'keyword'] + "]"
348 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.keyword.apply(lambda x:[x])
112 µs ± 3.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Whoa, so apply is faster?
Not quite. Maybe in a very very small df
, but take a look at the same operation on a bigger df
with 100,000 times more rows :
df = pd.concat([df]*100000)
%timeit "[" + df.loc[mask, 'keyword'] + "]"
4.54 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.keyword.apply(lambda x:[x])
129 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So apply
gets very very slow very fast, but vectorized operations don't
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With