Last Friday I got a problem, to transform a text to another format. On that machine, only gnu sed is available, no awk (strange, I know). And I know nothing about perl. so I am looking for a sed only solution.
the file content is:
a yao.com sina.com
b kongu.com
c polm.com unee.net 21cn.com iop.com foo.com bar.com baz.net happy2all.com
d kinge.net
the required output, (should be a new file) is:
a yao.com
a sina.com
b kongu.com
c polm.com
c unee.net
c 21cn.com
c iop.com
c foo.com
c bar.com
c baz.net
c happy2all.com
d kinge.net
I tried a lot, also searched famous sed oneliner, but I cannot make it... can someone help me?
Interesting problem:
$ sed -r 's/(\w+\.\w+)/> &/2g;:a s/^([a-z]+)(.*)>/\1\2\n\1/g;ta' file
a yao.com
a sina.com
b kongu.com
c polm.com
c unee.net
c 21cn.com
c iop.com
c foo.com
c bar.com
c baz.net
c happy2all.com
d kinge.net
Edit:
It works by using two substitutions.
The first puts a >
before the URLs that need flattening as a holding character:
$ sed -r 's/(\w+\.\w+)/> &/2g' file
a yao.com > sina.com
b kongu.com
c polm.com > unee.net > 21cn.com > iop.com > foo.com > bar.com ...
d kinge.net
The second basically replaces the holding >
with a newline (uses conditional branching):
$ sed -r ':a s/^([a-z]+)(.*)>/\1\2\n\1/g;ta'
It is not easy job for sed, particularly, an one liner. however you mentioned "gnu sed". I see the light!
gnu sed supports s/.../.../ge
which is useful for this situation:
kent$ sed -r 's@(^[a-z]+) (.*)@echo "\2"\|sed "s# #\\n\1 #g"\|sed "/^$/d"@ge' file
a yao.com
a sina.com
b kongu.com
c polm.com
c unee.net
c 21cn.com
c iop.com
c foo.com
c bar.com
c baz.net
c happy2all.com
d kinge.net
short explanation:
sed -r '[email protected][email protected]..@ge' file
the ge
allows us pass matched part to external commands..y..
part is done by the magic of ge
. I pass \2
to another sed
(via echo
) : sed "s# #\\n\1 #g"
this sed replace all space with \n + \1 + space
\n
on each line (ending), so there are empty lines in the result of step 2 (above step), we need remove those empty lines "/^$/d"
check info sed
for the s/../../ge
edit, added the double spaces as OP commented.
As other have noted, a sed solution is tricky so I thought I post a bash-dito:
#!/bin/bash
while read -a array
do
for i in ${array[@]:1}
do
echo ${array[0]} $i
done
done < input
output:
a yao.com
a sina.com
b kongu.com
c polm.com
c unee.net
c 21cn.com
c iop.com
c foo.com
c bar.com
c baz.net
c happy2all.com
d kinge.net
Here is a true sed-only script that works. I've written it below as a file that is called by sed on the command line, but it could all be typed on the command line or all entered into a separate script as well:
Save the following as sedscript (or whatever you want to call it). Explanation follows the output.
:start
h
s/\(.\ \ [^ ]*\).*/\1/
t continue
d
:continue
p
x
s/\(.\ \)\ [^ ]*\(\ .*\)/\1\2/
t start
d
Now run sed -f sedscript myfile.txt
With your example above saved as myfile.txt, the following is output:
a yao.com
a sina.com
b kongu.com
c polm.com
c unee.net
c 21cn.com
c iop.com
c foo.com
c bar.com
c baz.net
c happy2all.com
d kinge.net
Sed has a pattern buffer (where you normally work with s/a/b/
kinds of commands) and a hold buffer. In this script, information is swapped back and forth to the hold buffer to retain the unedited part of a line while working on another part.
:start
= label to enable jumping
h
= swap the pattern buffer (current line) into the hold buffer
s/\(.\ \ [^ ]*\).*/\1/
= While the full line is safe in the hold buffer, strip everything after the first domain, leaving the first desired line (e.g. "a yao.com").
t continue
= if the previous command resulted in a substitution, jump to the "continue" label
d
= if we didn't jump, that means we're done. Delete the pattern buffer and proceed to the next line of the file.
:continue
= label for the previous jump
p
= print out the pattern buffer (e.g. "a yao.com")
x
= swap the pattern buffer with the hold buffer (could also use g
to simply copy the hold buffer over the pattern buffer)
s/\(.\ \)\ [^ ]*\(\ .*\)/\1\2/
= The full original string has now been swapped into the pattern buffer - strip off the domain we just dealt with (e.g. "yao.com")
t start
= if that wasn't the last domain, start the script over with the new, shortened string.
d
= if that was the last domain, delete the pattern buffer and continue to the next line in the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With