Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add two lines every four lines between patterns - SED

Tags:

sed

I'm needing some help with Sed. I'm using it on Windows and Mac OSX. I need to Sed to add a

</tr>
<tr>

every 4 lines, after the first <tr> found, and stop doing it on </tr>

i Just can't find a way to doing this. Every file will have up to 20 tables, so i need to do it automatically...

changing from this

<div class="titulo"> TERMINAL CAPAO DA IMBUIA</div>
<div class="dataedia">
Válido a partir de: 30/07/2012 - 
DIA ÚTIL</div>
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>

to this

<div class="titulo"> TERMINAL CAPAO DA IMBUIA</div>
<div class="dataedia">
Válido a partir de: 30/07/2012 - 
DIA ÚTIL</div>
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
</tr>
<tr>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
</tr>
<tr>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
</tr>
<tr>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
</tr>
<tr>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
</tr>
<tr>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
</tr>
<tr>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
</tr>
<tr>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
</tr>
<tr>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>

Is it possible with sed? If not, what tool should i use?

Thanks

like image 960
ghaschel Avatar asked Mar 26 '26 06:03

ghaschel


2 Answers

I don't like the idea of using sed to handle HTML code. Said that, try with this:

Content of script.sed:

## For every line between '<tr>' and '</tr>' do ...
/<tr>/,/<\/tr>/ {

    ## Omit range edges.
    /<\/\?tr>/ b;

    ## Append '<td>...</td>' to Hold Space (HS).
    H;  

    ## Get HS to Pattern Space (PS) to work with it.
    x;  

    ## If there are at least four newline characters means that exists four
    ## '<td>' tags too, so add a '<tr>' before them and a '</tr>' after them,
    ## print, and delete them (already processed).
    /\(\n[^\n]*\)\{4\}/ {
        s/^\(\n\)/<tr>\1/;
        s/$/\n<\/tr>/;
        p   
        s/^.*$//;
    }   

    ## Save the '<td>'s to HS again and read next line.
    x;  
    b;  
}

## Print all lines out of the range.
p;

Assuming infile with the data posted in the question, run the script like:

sed -nf script.sed infile

That yields:

<div class="titulo"> TERMINAL CAPAO DA IMBUIA</div>
<div class="dataedia">
Válido a partir de: 30/07/2012 - 
DIA ÚTIL</div>
<table>
<tr>
<td>05:50</td>
<td>05:58</td>
<td>06:04</td>
<td>06:08</td>
</tr>
<tr>
<td>06:12</td>
<td>06:15</td>
<td>06:17</td>
<td>06:20</td>
</tr>
<tr>
<td>06:22</td>
<td>06:25</td>
<td>06:27</td>
<td>06:30</td>
</tr>
<tr>
<td>06:32</td>
<td>06:35</td>
<td>06:37</td>
<td>06:39</td>
</tr>
<tr>
<td>06:42</td>
<td>06:44</td>
<td>06:47</td>
<td>06:49</td>
</tr>
<tr>
<td>06:52</td>
<td>06:54</td>
<td>06:57</td>
<td>06:59</td>
</tr>
<tr>
<td>07:01</td>
<td>07:04</td>
<td>07:06</td>
<td>07:09</td>
</tr>
<tr>
<td>07:11</td>
<td>07:14</td>
<td>07:16</td>
<td>07:18</td>
</tr>
<tr>
<td>07:21</td>
<td>07:23</td>
<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
</table>
</div>
like image 116
Birei Avatar answered Apr 01 '26 07:04

Birei


try awk

awk '{print}; /<td>/ && ++i==4 {print "</tr>\n<tr>"; i=0}' file
  • print the line
  • if it's a <td> then increase i
  • if i is 4 print </tr><tr> and reset i

Testing with given input the desired output is returned, with the only "problem" that an extra <tr></tr> appears at the end of the list. This is fixable but I'm running out of time here. When I get back I can look into it if you think it is needed.

... part of the end of the result file

<td>07:26</td>
<td>07:28</td>
</tr>
<tr>
<td>07:31</td>
<td>07:33</td>
<td>07:36</td>
<td>07:38</td>
</tr>
<tr>             <-- extra <tr></tr> here
</tr>
</table>
like image 41
c00kiemon5ter Avatar answered Apr 01 '26 07:04

c00kiemon5ter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!