Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine 2 string columns in pandas with different conditions in both columns

I have 2 columns in pandas, with data that looks like this.

code fx         category
AXD  AXDG.R     cat1
AXF  AXDG_e.FE  cat1 
333  333.R      cat1
....

There are other categories but I am only interested in cat1.

I want to combine everything from the code column, and everything after the . in the fx column and replace the code column with the new combination without affecting the other rows.

code    fx         category
AXD.R   AXDG.R     cat1
AXF.FE  AXDG_e.FE  cat1
333.R   333.R      cat1
.....

Here is my code, I think I have to use regex but I'm not sure how to combine it in this way.

df.loc[df['category']== 'cat1', 'code'] = df[df['category'] == 'cat1']['code'].str.replace(r'[a-z](?=\.)', '', regex=True).str.replace(r'_?(?=\.)','', regex=True).str.replace(r'G(?=\.)', '', regex=True)

I'm not sure how to select the second column also. Any help would be greatly appreciated.

like image 700
anarchy Avatar asked Dec 19 '21 17:12

anarchy


People also ask

How do I merge two string columns in pandas?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

How do I get two column combinations in pandas?

To get all combinations of columns we will be using itertools. product module. This function computes the cartesian product of input iterables. To compute the product of an iterable with itself, we use the optional repeat keyword argument to specify the number of repetitions.

Which option can be used to concatenate 2 columns in pandas if one of them is not string type?

Concatenating DataFrame. In order to concat dataframe, we use concat() function which helps in concatenating a dataframe.

How to combine two columns in pandas Dataframe?

You can combine two columns in Pandas using df [“new column name“] = df [“column 1”] + df ["column 2”] statement. In this tutorial, you’ll learn how to combine or concatenate two or more columns in Pandas dataframe to create another column. If You’re in Hurry… You can use the + operator to concatenate two columns in the pandas dataframe.

How to join or concatenate string in pandas python?

join or concatenate string in pandas python – Join () function is used to join or concatenate two or more strings in pandas python with the specified separator. In this tutorial lets see How to join or concatenate two strings with specified separator

How to concatenate two columns in Excel?

Now if you are working with large datasets, the more efficient way to concatenate two columns is using the + operator. If you want to include a separator then simply place it as a string in-between the two columns: Now let’s assume that one of the columns you are trying to concatenate is not in string format:

How to merge two data frames in Python?

Here we are creating a data frame using a list data structure in python. Here in the above example, we created a data frame. Let’s merge the two data frames with different columns. It is possible to join the different columns is using concat () method.


3 Answers

There are other categories but I am only interested in cat1

You can use str.split with series.where to add the extention for cat1:

df['code'] = (df['code'].astype(str).add("."+df['fx'].str.split(".").str[-1])
             .where(df['category'].eq("cat1"),df['code']))

print(df)

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1
like image 82
anky Avatar answered Sep 30 '22 01:09

anky


You can extract the part of fx and append it to code:

df['code'] += df['fx'].str.extract('(\..*$)')[0]

output:

     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1

to limit to cat1 only:

df.loc[df['category'].eq('cat1'), 'code'] += df['fx'].str.extract('(\..*$)')[0]
like image 39
mozway Avatar answered Sep 30 '22 03:09

mozway


You can use Series.str.extract:

df['code'] = df['code'].astype(str) + np.where(df['category'].eq('cat1'), df['fx'].astype(str).str.extract('(\..+)')[0], '')

Output:

>>> df
     code         fx category
0   AXD.R     AXDG.R     cat1
1  AXF.FE  AXDG_e.FE     cat1
2   333.R      333.R     cat1
like image 40
richardec Avatar answered Sep 30 '22 03:09

richardec