Python Pandas DataFrame str contains merge if

Tags:

I want to merge the rows of the two dataframes hereunder, when the strings in Test1 column of DF2 contain a substring of Test1 column of DF1.

Click to copy

DF1 = pd.DataFrame({'Test1':list('ABC'),
                   'Test2':[1,2,3]})

print (DF1)
  Test1  Test2
0     A      1
1     B      2
2     C      3

DF2 = pd.DataFrame({'Test1':['ee','bA','cCc','D'],
                   'Test2':[1,2,3,4]})

print (DF2)
  Test1  Test2
0    ee      1
1    bA      2
2   cCc      3
3     D      4

For that, I am able with "str contains" to identify the substring of DF1.Test1 available in the strings of DF2.Test1

INPUT:

Click to copy

for i in DF1.Test1:

ok = DF2[Df2.Test1.str.contains(i)]

print(ok)

OUPUT:

enter image description here

Now, I would like to add in the output, the merge of the substrings of Test1 which match with the strings of Test2

OUPUT:

enter image description here

For that, I tried with "pd.merge" and "if" but i am not able to find the right code yet.. Do you have suggestions please?

Click to copy

for i in DF1.Test1:

if DF2.Test1.str.contains(i) == 'True':

    ok = pd.merge(DF1, DF2, on= ['Test1'[i]], how='outer') 

    print(ok)

Thank you for your ideas :)

878

asked Mar 20 '18 11:03

Xav Dou

2 Answers

I could not respnd to jezrael's comment because of my reputation. But I changed his answer to a function to merge on non-capitalized text.

Click to copy

def str_merge(part_string_df,full_string_df, merge_column):
    merge_column_lower = 'merge_column_lower'
    part_string_df[merge_column_lower] = part_string_df[merge_column].str.lower()
    full_string_df[merge_column_lower] = full_string_df[merge_column].str.lower()   
    pat = '|'.join(r"{}".format(x) for x in part_string_df[merge_column_lower])
    full_string_df['Test3'] = full_string_df[merge_column_lower].str.extract('('+ pat + ')', expand=True)
    DF = pd.merge(part_string_df, full_string_df, left_on= merge_column_lower, right_on='Test3').drop([merge_column_lower + '_x',merge_column_lower + '_y','Test3'],axis=1)
    return DF

Used with example:

Click to copy

DF1 = pd.DataFrame({'Test1':list('ABC'),
                   'Test2':[1,2,3]})

DF2 = pd.DataFrame({'Test1':['ee','bA','cCc','D'],
                   'Test2':[1,2,3,4]})

print(str_merge(DF1,DF2, 'Test1'))
  Test1_x  Test2_x Test1_y  Test2_y
0       B        2      bA        2
1       C        3     cCc        3

172

answered Sep 24 '22 11:09

Programmeer_meneer

I believe you need extract values to new column and then merge, last remove helper column Test3:

Click to copy

pat = '|'.join(r"{}".format(x) for x in DF1.Test1)
DF2['Test3'] = DF2.Test1.str.extract('('+ pat + ')', expand=False)
DF = pd.merge(DF1, DF2, left_on= 'Test1', right_on='Test3').drop('Test3', axis=1)
print (DF)
  Test1_x  Test2_x Test1_y  Test2_y
0       A        1      bA        2
1       C        3     cCc        3

Detail:

Click to copy

print (DF2)
  Test1  Test2 Test3
0    ee      1   NaN
1    bA      2     A
2   cCc      3     C
3     D      4   NaN

answered Sep 22 '22 11:09

jezrael

Related questions
                            
                                How to load a SQLite3 extension in SQLAlchemy?
                            
                                Get data set as numpy array from TFRecordDataset
                            
                                Printing utf8 strings in Sublime Text's console with Windows
                            
                                How to feed sound as input to neural networks? [closed]
                            
                                How to train a network with multiple output layers in CNTK?
                            
                                Function that returns an accumulator in Python
                            
                                How to generate all combinations of a set of characters without repetitions?
                            
                                Why doesn't python3's print statement flush output when end keyword is specified?
                            
                                Python: Loop to open multiple folders and files in python
                            
                                Pythonic cumulative map
                            
                                Keras - method on_batch_end is slow but only callback I have is checkpoint
                            
                                Jupyter Notebook: (OperationalError('disk I/O error',))
                            
                                cant iterate nested for loop as wanted -python -maybe a simple mistake
                            
                                Creating a "white" image in numpy (2-D image)
                            
                                how to import resource module?
                            
                                Django channels: No module named 'asgiref.sync'
                            
                                Find the nearest location using numpy
                            
                                Asyncio in Django
                            
                                python decorators *args and ** kwargs
                            
                                CountVectorizer converts words to lower case

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas DataFrame str contains merge if

Tags:

python

merge

pandas

dataframe

Xav Dou

People also ask

2 Answers

Programmeer_meneer

jezrael

Recent Activity

Donate For Us