I am trying to download files from a database using wget and url. E.g.
wget "http://www.rcsb.org/pdb/files/1BXS.pdb"
So format of the url is as such: http://www.rcsb.org/pdb/files/($idnumber).pdb"
But I have many files to download; so I wrote a bash script that reads id_numbers from a text file, forms url string and downloads by wget.
!/bin/bash
while read line
do
url="http://www.rcsb.org/pdb/files/$line.pdb"
echo -e $url
wget $url
done < id_numbers.txt
However, url string is formed as
.pdb://www.rcsb.org/pdb/files/4H80
So, .pdb
is repleced with http
. I cannot figure out why. Does anyone have an idea?
How can I format it so url is
"http://www.rcsb.org/pdb/files/($idnumber).pdb"
? Thanks a lot.
Note. This question was marked as duplicate of 'How to concatenate strings in bash?' but I was actually asking for something else. I read that question before asking this one and it turns out my problem was with preparing the txt file in Windows not really string concetanation. I edited question title. I hope it is more clear now.
To copy files and directories use the cp command under a Linux, UNIX-like, and BSD like operating systems. cp is the command entered in a Unix and Linux shell to copy a file from one place to another, possibly on a different filesystem.
Syntax: Read file line by line on a Bash Unix & Linux shell The syntax is as follows for bash, ksh, zsh, and all other shells to read a file line by line: while read -r line; do COMMAND; done < input. file. The -r option passed to read command prevents backslash escapes from being interpreted.
Copy a File ( cp ) You can also copy a specific file to a new directory using the command cp followed by the name of the file you want to copy and the name of the directory to where you want to copy the file (e.g. cp filename directory-name ).
Using the head and tail Commands Let's say we want to read line X. The idea is: First, we get line 1 to X using the head command: head -n X input. Then, we pipe the result from the first step to the tail command to get the last line: head -n X input | tail -1.
It sounds like your id_numbers.txt file has DOS/Windows-style line endings (carriage return followed by linefeed characters) instead of plain unix line endings (just linefeed). The result is that read
thinks the line ends with a carriage return, $line
actually has a carriage return at the end, and that gets embedded in the url, causing various confusion.
There are several ways to solve this. You could have bash trim the carriage return from the variable when you use it:
url="http://www.rcsb.org/pdb/files/${line%$'\r'}.pdb"
Or you could have read
trim it by telling it that carriage return counts as whitespace (read
will trim leading and trailing whitespace from what it reads):
while IFS=$'\r' read line
Or you could use a command like dos2unix (or whatever the equivalent is on your OS) to convert the id_numbers.txt file.
The -e
echo option is used to output the desired content without inserting a new line, you do not need it here.
Also I suspect your file containing the ids to be malformed, on which OS did you create it?
Anyway, you can simplify your script this way:
!/bin/bash
while read line
do
wget "http://www.rcsb.org/pdb/files/$line.pdb"
done < id_numbers.txt
I was able to successfully test it with an id_numbers.txt
file generated like so:
for i in $(0 9) ; do echo "$i" >> id_numbers.txt ; done
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With