I have 2 columns in pandas, with data that looks like this.
code fx category
AXD AXDG.R cat1
AXF AXDG_e.FE cat1
333 333.R cat1
....
There are other categories but I am only interested in cat1.
I want to combine everything from the code
column, and everything after the .
in the fx
column and replace the code column with the new combination without affecting the other rows.
code fx category
AXD.R AXDG.R cat1
AXF.FE AXDG_e.FE cat1
333.R 333.R cat1
.....
Here is my code, I think I have to use regex but I'm not sure how to combine it in this way.
df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)
I'm not sure how to select the second column also. Any help would be greatly appreciated.
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
To get all combinations of columns we will be using itertools. product module. This function computes the cartesian product of input iterables. To compute the product of an iterable with itself, we use the optional repeat keyword argument to specify the number of repetitions.
Concatenating DataFrame. In order to concat dataframe, we use concat() function which helps in concatenating a dataframe.
You can combine two columns in Pandas using df [“new column name“] = df [“column 1”] + df ["column 2”] statement. In this tutorial, you’ll learn how to combine or concatenate two or more columns in Pandas dataframe to create another column. If You’re in Hurry… You can use the + operator to concatenate two columns in the pandas dataframe.
join or concatenate string in pandas python – Join () function is used to join or concatenate two or more strings in pandas python with the specified separator. In this tutorial lets see How to join or concatenate two strings with specified separator
Now if you are working with large datasets, the more efficient way to concatenate two columns is using the + operator. If you want to include a separator then simply place it as a string in-between the two columns: Now let’s assume that one of the columns you are trying to concatenate is not in string format:
Here we are creating a data frame using a list data structure in python. Here in the above example, we created a data frame. Let’s merge the two data frames with different columns. It is possible to join the different columns is using concat () method.
There are other categories but I am only interested in cat1
You can use str.split
with series.where
to add the extention for cat1:
df['code'] = (df['code'].astype(str).add("."+df['fx'].str.split(".").str[-1])
.where(df['category'].eq("cat1"),df['code']))
print(df)
code fx category
0 AXD.R AXDG.R cat1
1 AXF.FE AXDG_e.FE cat1
2 333.R 333.R cat1
You can extract
the part of fx
and append it to code
:
df['code'] += df['fx'].str.extract('(\..*$)')[0]
output:
code fx category
0 AXD.R AXDG.R cat1
1 AXF.FE AXDG_e.FE cat1
2 333.R 333.R cat1
to limit to cat1
only:
df.loc[df['category'].eq('cat1'), 'code'] += df['fx'].str.extract('(\..*$)')[0]
You can use Series.str.extract
:
df['code'] = df['code'].astype(str) + np.where(df['category'].eq('cat1'), df['fx'].astype(str).str.extract('(\..+)')[0], '')
Output:
>>> df
code fx category
0 AXD.R AXDG.R cat1
1 AXF.FE AXDG_e.FE cat1
2 333.R 333.R cat1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With