Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove specific word from string

I am using oracle10g.

I want to remove all occurrences of particular word from sentence, But I don't want to remove any other word which contains other characters between a-z or A-Z.

For example, Following is a sentence from which I want to remove some:

some text, 123 someone, another text some1

Expected output:

 text, 123 someone, another text

Note that I also want to remove some word if it contains some+ any other word than A-Z and a-z before or after some.

This is what I have tried so far:

select replace('some text, 123 someone, another text some1','some','') 
from dual;

I am getting output:

 text, 123 one, another text 1

In above output I am expecting someone not to be replaced and some1 should be replaced totally.

How should I achieve this? Any suggestion will be appreciated.

Edit: For clarity this is another example of what I am looking for:

some other text someone other text, someB some1 some.

output should be:

 other text someone other text, someB 

From above sentence someB is not removed because it has characters between a-z
And some1 and some. is removed becasue it doesn't has characters between a-z.

Edit2

If I use regex:

select REGEXP_REPLACE('some text, 123 someone, another text some1','[^a-zA-Z]','')
from dual

I am getting output:

sometextsomeoneanothertextsome

Expected output:

sometextsomeoneanothertext

Note that I want some1 also be removed from string as it contains other character than A-Z.

Answers using regex are also appreciated.

like image 962
Bhushan Avatar asked Dec 25 '22 14:12

Bhushan


2 Answers

Due to lack of support for lookbehind/lookahead and word boundary(\b) in Oracle implementation of regular expression, it seems to be impossible to meet all requirements in single REGEXP_REPLACE call. Especially for case, pointed out by Egor Skriptunoff : pattern matches, followed one by one with only one separator between them like some some some some ....

Without this case it's possible to match all such strings with this call:

regexp_replace(
  source_string,                                       -- source string
  '([^[:alnum:]]|^)((\d)*some(\d)*)([^[:alnum:]]|$)',  -- pattern
  '\1\5',                                              -- leave separators in place
  1,                                                   -- start from beginning
  0,                                                   -- replace all occurences
  'im'                                                 -- case-insensitive and multiline 
);

Pattern parts:

(                -- start of Group #1
  [^[:alnum:]]   -- any non-alphanumeric character 
  |              -- or 
  ^              -- start of string or start of line 
)                -- end of Group #1
(                -- start of Group #2
  (              -- start of Group #3 
    \d           -- any digit
  )              -- end of Group #3
  *              -- include in previous group zero or more consecutive digits
  some           -- core string to match
  (              -- start of group #4
    \d           -- any digit
  )              -- end of group #4  
  *              -- include in previous group zero or more consecutive digits
)                -- end of Group #2
(                -- start of Group #5
  [^[:alnum:]]   -- any non-alphanumeric character 
  |              -- or
  $              -- end of string or end of line
)                -- end of Group #5

Because separators used for matching (Group #1 and Group #5) included in match pattern it will be removed from source string on successful match, so we need restore this parts by specifying in third regexp_replace parameter.

Based on this solution it's possible to replace all, even repetitive occurrences within a loop.

For example, you can define a function like that:

create or replace function delete_str_with_digits(
  pSourceString in varchar2, 
  pReplacePart  in varchar2  -- base string (like 'some' in question)
)
  return varchar2
is
  C_PATTERN_START constant varchar2(100) := '([^[:alnum:]]|^)((\d)*';
  C_PATTERN_END   constant varchar2(100) := '(\d)*)([^[:alnum:]]|$)';

  vPattern         varchar2(4000);
  vCurValue        varchar2(4000);
  vPatternPosition binary_integer;
begin

  vPattern := C_PATTERN_START || pReplacePart || C_PATTERN_END;
  vCurValue := pSourceString;

  vPatternPosition := regexp_instr(vCurValue, vPattern);

  while(vPatternPosition > 0) loop
    vCurValue := regexp_replace(vCurValue, vPattern,'\1\5',1,0,'im');
    vPatternPosition := regexp_instr(vCurValue, vPattern);
  end loop;

  return vCurValue;  

end;

and use it with SQL or other PL/SQL code:

SELECT 
  delete_str_with_digits(
    'some text, -> awesome <- 123 someone, 3some3
     line of 7 :> some some some some some some some <
222some  another some1? some22 text 0some000', 
    'some'
  )  as result_string
FROM 
  dual

SQLFiddle example

like image 110
ThinkJet Avatar answered Dec 28 '22 06:12

ThinkJet


Here is an approach that doesn't use regular expressions:

select trim(replace(' '||'some text, 123 someone, another text some1'||' ',
                    ' some ',' '
                   ) 
           )
from dual;
like image 38
Gordon Linoff Avatar answered Dec 28 '22 08:12

Gordon Linoff