I have a file that contains sequence data, where each new paragraph (separated by two blank lines) contain a new sequence:
#example
ASDHJDJJDMFFMF
AKAKJSJSJSL---
SMSM-....SKSKK
....SK
SKJHDDSNLDJSCC
AK..SJSJSL--HG
AHSM---..SKSKK
-.-GHH
and I want to end up with a file looking like:
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
each sequence is the same length (if that helps).
I would also be looking to do this over multiple files stored in different directiories.
I have just tried
sed -e '/./{H;$!d;}' -e 'x;/regex/!d' ./text.txt
however this just deleted the entire file :S
any help would bre appreciated - doesn't have to be in sed, if you know how to do it in perl or something else then that's also great.
Thanks.
All you're asking to do is convert a file of blank-lines-separated records (RS) where each field is separated by newlines into a file of newline-separated records where each field is separated by nothing (OFS). Just set the appropriate awk variables and recompile the record:
$ awk '{$1=$1}1' RS= OFS= file
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
awk '
/^[[:space:]]*$/ {if (line) print line; line=""; next}
{line=line $0}
END {if (line) print line}
'
perl -00 -pe 's/\n//g; $_.="\n"'
For multiple files:
# adjust your glob pattern to suit,
# don't be shy to ask for assistance
for file in */*.txt; do
newfile="/some/directory/$(basename "$file")"
perl -00 -pe 's/\n//g; $_.="\n"' "$file" > "$newfile"
done
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With