Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter out a set of strings A from a set of strings B using Bash

I have a list of strings which I want to remove from a super set of another strings, not in a any specific order and thus constructing a new set. Is that doable in Bash?

like image 975
Ilyes Gouta Avatar asked Oct 23 '09 21:10

Ilyes Gouta


People also ask

How do I split a string in Bash?

In bash, a string can also be divided without using $IFS variable. The 'readarray' command with -d option is used to split the string data. The -d option is applied to define the separator character in the command like $IFS. Moreover, the bash loop is used to print the string in split form.

How do I escape a variable in Bash?

A non-quoted backslash, \, is used as an escape character in Bash. It preserves the literal value of the next character that follows, with the exception of newline.

How trim a string in Linux?

-c (column): To cut by character use the -c option. This selects the characters given to the -c option. This can be a list of numbers separated comma or a range of numbers separated by hyphen(-).


2 Answers

It looks like you're looking for something with better than O(nm) running time, so here's an answer to that. Fgrep or grep -F uses the Aho-Corasick algorithm to make a single FSM out of a list of fixed strings, so checking each word in SET2 takes O(length of word) time. This means the whole running time of this script is O(n+m).

(obviously the running times are also dependent on the length of the words)

[meatmanek@yggdrasil ~]$ cat subtract.sh 
#!/bin/bash
subtract()
{
  SET1=( $1 )
  SET2=( $2 )
  OLDIFS="$IFS"
  IFS=$'\n'
  SET3=( $(grep -Fxv "${SET1[*]}" <<< "${SET2[*]}") )
  IFS="$OLDIFS"
  echo "${SET3[*]}"
  # SET3 = SET2-SET1
}
subtract "$@"
[meatmanek@yggdrasil ~]$ . subtract.sh 

[meatmanek@yggdrasil ~]$ subtract "package-x86 test0 hello world" "computer hello sizeof compiler world package-x86 rocks"
computer sizeof compiler rocks
[meatmanek@yggdrasil ~]$ 
like image 108
Evan Krall Avatar answered Nov 24 '22 18:11

Evan Krall


> echo "aa b1 c b2 d" |xargs -d' ' -n 1
aa
b1 
c
b2
d

> echo "aa b1 c b2 d" |xargs -d' ' -n 1| grep "^b"
b1
b2
like image 36
Andreas Otto Avatar answered Nov 24 '22 18:11

Andreas Otto