Change pandas data frame column values inplace

Tags:

I have a pandas data frame.

keyword                     adGroup     goal6Value   adCost
aaaa                        (not set)   0            0.0
+bbbb                       (not set)   0            0.0
+cccc                       (not set)   2072         0.0
dddd                        (not set)   0            0.0

I changed the values in the first column, to add brackets to the keywords based on some conditions (if there's no "+" symbol, add brackets).

keyword                     adGroup     goal6Value   adCost
[aaaa]                      (not set)   0            0.0
+bbbb                       (not set)   0            0.0
+cccc                       (not set)   2072         0.0
[dddd]                      (not set)   0            0.0

This is the function created to add bracket:

def add_bracket(df):

    df["keyword"] = df["keyword"].astype('str')
    keyword_list = list()

    for index, row in df.iterrows():
       keyword = row["keyword"]
       if keyword.find("+") < 0:
         keyword = "[" + keyword + "]"
       keyword_list.append(keyword)

    kw = pd.DataFrame(keyword_list, columns = ['Keyword2'])
    df2 = pd.concat([df, kw], axis=1).drop(columns["keyword"]).rename(columns={'Keyword2': 'keyword'})
    df2 = df2[['keyword', 'adGroup', 'goal6Value', 'adCost']]
    return df2

The function produced the result I want, but is there a neater way in pandas so that I don't need to create df2 to add the output of column 1 (basically doing the changes inplace)?

Solution: Based on @Inder's suggested answer, this whole function can be written in one line.

df["keyword"] = df.keyword.apply(lambda x: "[" + x + "]" if x.find("+") < 0 else x)

Based on @RafaelC's answer.

mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"

301

asked Aug 05 '18 20:08

azmirfakkri

1 Answers

Just sum

mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"

    keyword 
0   [aaaa]  
1   [bbbb]  
2   [cccc]  
3   [dddd]

Why is this better than apply?

Take a look at the timings :

%timeit "[" + df.loc[mask, 'keyword'] + "]"
348 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.keyword.apply(lambda x:[x])
112 µs ± 3.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Whoa, so apply is faster?

Not quite. Maybe in a very very small df, but take a look at the same operation on a bigger df with 100,000 times more rows :

df = pd.concat([df]*100000)

%timeit "[" + df.loc[mask, 'keyword'] + "]"
4.54 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.keyword.apply(lambda x:[x])
129 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So apply gets very very slow very fast, but vectorized operations don't

answered Sep 22 '22 06:09

rafaelc

Related questions
                            
                                “Could not run curl-config: [Errno 2] No such file or directory” when installing pycurl on Alpine Linux
                            
                                Save tensors as images in TensorFlow
                            
                                pyinstaller Recursion error: maximum recursion depth exceeded
                            
                                Regex to match capital/special/unicode/vietnamese characters
                            
                                How to specify a directory in which to save an image using plotly py.image.save_as
                            
                                Auto increment version number in a Python webserver, with git
                            
                                How can I write my own decorator in Django?
                            
                                Vectorizing calculation in matrix with interdependent values
                            
                                plotly: TypeError: cannot convert dictionary update sequence element #0 to a sequence
                            
                                Google DataFlow/Python: Import errors with save_main_session and custom modules in __main__
                            
                                Scikit-learn how to check if model (e.g. TfidfVectorizer) has been already fit
                            
                                Differences between OtpionMenu and ComboBox in tkinter
                            
                                Pandas - Go through 2 columns (latitude and longitude) and find the distance between each coordinate and a specific place
                            
                                How rename pd.value_counts() index with a correspondance dictionary
                            
                                Find similar items in list of dictionaries based on values
                            
                                'module' object has no attribute 'lru_cache'
                            
                                Accuracy Stuck at 50% Keras
                            
                                Block Bootstrapped Sampling in Pandas
                            
                                Cleaning email chain for text analysis python
                            
                                ModuleNotFoundError: No module named 'skimage.util.montage'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Change pandas data frame column values inplace

Tags:

python

pandas

pan

azmirfakkri

People also ask

1 Answers

rafaelc

Recent Activity

Donate For Us