How to loop a variable range in cut command

Question

I have a file with 2 columns, and i want to use the values from the second column to set the range in the cut command to select a range of characters from another file. The range i desire is the character in the position of the value in the second column plus the next 10 characters. I will give an example in a while.

My files are something like that:

File with 2 columns and no blank lines between lines (file1.txt):

NAME1 10
NAME2 25
NAME3 48
NAME4 66

File that i want to extract the variable range of characters(just one very long line with no spaces and no bold font) (file2.txt):

GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC

...or, more literally (for copy/paste to test):

GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC

Desired resulting file, one sequence per line (result.txt):

GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT

The resulting file would have the characters from 10-20, 25-35, 48-58 and 66-76, each range in a new line. So, it would always keep the range of 10, but in different start points and those start points are set by the values in the second column from the first file.

I tried the command:

for i in $(awk '{print $2}' file1.txt);
do
        p1=$i;
        p2=`expr "$1" + 10`
        cut -c$p1-$2 file2.txt > result.txt;
done

I don't get any output or error message.

I also tried:

while read line; do
    set $line
    p2=`expr "$2" + 10`
    cut -c$2-$p2 file2.txt > result.txt;
done <file1.txt

This last command gives me an error message:

cut: invalid range with no endpoint: -
Try 'cut --help' for more information.
expr: non-integer argument

Charles Duffy · Accepted Answer

There's no need for cut here; dd can do the job of indexing into a file, and reading only the number of bytes you want. (Note that status=none is a GNUism; you may need to leave it out on other platforms and redirect stderr otherwise if you want to suppress informational logging).

while read -r name index _; do
  dd if=file2.txt bs=1 skip="$index" count=10 status=none
  printf '
'
done <file1.txt >result.txt

This approach avoids excessive memory requirements (as present when reading the whole of file2 -- assuming it's large), and has bounded performance requirements (overhead is equal to starting one copy of dd per sequence to extract).

Rahul Verma · Answer

Using awk

$ awk 'FNR==NR{a=$0; next} {print substr(a,$2+1,10)}' file2 file1
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT

How to loop a variable range in cut command

Tags:

bash

environment-variables

cut

Fernanda Costa

2 Answers

Charles Duffy

Rahul Verma

Recent Activity

Donate For Us

How to loop a variable range in cut command

Tags:

bash

environment-variables

cut

Fernanda Costa

2 Answers

Charles Duffy

Rahul Verma

Related questions

Recent Activity

Donate For Us