Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

put all separate paragraphs of a file into a separate line

I have a file that contains sequence data, where each new paragraph (separated by two blank lines) contain a new sequence:

#example

ASDHJDJJDMFFMF
AKAKJSJSJSL---
SMSM-....SKSKK
....SK


SKJHDDSNLDJSCC
AK..SJSJSL--HG
AHSM---..SKSKK
-.-GHH

and I want to end up with a file looking like:

ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH

each sequence is the same length (if that helps).

I would also be looking to do this over multiple files stored in different directiories.

I have just tried

sed -e '/./{H;$!d;}' -e 'x;/regex/!d' ./text.txt

however this just deleted the entire file :S

any help would bre appreciated - doesn't have to be in sed, if you know how to do it in perl or something else then that's also great.

Thanks.

like image 509
brucezepplin Avatar asked Dec 20 '12 12:12

brucezepplin


2 Answers

All you're asking to do is convert a file of blank-lines-separated records (RS) where each field is separated by newlines into a file of newline-separated records where each field is separated by nothing (OFS). Just set the appropriate awk variables and recompile the record:

$ awk '{$1=$1}1' RS= OFS= file
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
like image 76
Ed Morton Avatar answered Sep 28 '22 04:09

Ed Morton


awk '
    /^[[:space:]]*$/ {if (line) print line; line=""; next}
    {line=line $0}
    END {if (line) print line}
'
perl -00 -pe 's/\n//g; $_.="\n"'

For multiple files:

# adjust your glob pattern to suit, 
# don't be shy to ask for assistance
for file in */*.txt; do
    newfile="/some/directory/$(basename "$file")"
    perl -00 -pe 's/\n//g; $_.="\n"' "$file" > "$newfile"
done
like image 45
glenn jackman Avatar answered Sep 28 '22 04:09

glenn jackman