Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An alternative: cut -d <string>?

When I type ls I get:

aedes_aegypti_upstream_dremeready_all_simpleMasked_random.fasta
anopheles_albimanus_upstream_dremeready_all_simpleMasked_random.fasta
anopheles_arabiensis_upstream_dremeready_all_simpleMasked_random.fasta
anopheles_stephensi_upstream_dremeready_all_simpleMasked_random.fasta
culex_quinquefasciatus_upstream_dremeready_all_simpleMasked_random.fasta

I want to pipe this into cut (or via some alternative way) so that I only get:

aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus

If cut would accept a string (multiple characters) as it's delimiter then I could use:

cut -d "_upstream_" -f1

But that is not permitted as cut only takes single characters as delimiters.

like image 729
hello_there_andy Avatar asked Dec 05 '22 07:12

hello_there_andy


1 Answers

awk does allow a string as delimiter:

$ awk -F"_upstream_" '{print $1}' file
aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus
drosophila_melanogaster

Note for the given input you can also use cut with _ as delimiter and print first two records:

$ cut -d'_' -f-2 file
aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus
drosophila_melanogaster

sed and grep can also make it. For example, this grep uses a look-ahead to print everything from the beginning of the line until you find _upstream:

$ grep -Po '^\w*(?=_upstream)' file
aedes_aegypti
anopheles_albimanus
anopheles_arabiensis
anopheles_stephensi
culex_quinquefasciatus
drosophila_melanogaster
like image 157
fedorqui 'SO stop harming' Avatar answered Jan 08 '23 05:01

fedorqui 'SO stop harming'