I have a document that I need to dynamically create/update the indexes in. I am trying to acomplish this with awk. I have a partial working example but now I'm stumped.
The example document is as follows.
numbers.txt:
#) Title
#) Title
#) Title
#.#) Subtitle
#.#.#) Section
#.#) Subtitle
#) Title
#) Title
#.#) Subtitle
#.#.#) Section
#.#) Subtitle
#.#.#) Section
#.#.#.#) Subsection
#) Title
#) Title
#.#) Subtitle
#.#.#) Section
#.#.#.#) Subsection
#.#.#.#) Subsection
The desired output would be:
1) Title
2) Title
3) Title
3.1) Subtitle
3.1.1) Section
3.2) Subtitle
4) Title
5) Title
5.1) Subtitle
5.1.1) Section
5.2) Subtitle
5.2.1) Section
5.2.1.1) Subsection
6) Title
7) Title
7.1) Subtitle
7.1.1) Section
7.1.1.1) Subsection
7.1.1.2) Subsection
The awk code that I have which partially works is as follows.
numbers.sh:
awk '{for(w=1;w<=NF;w++)if($w~/^#\)/){sub(/^#/,++i)}}1' number.txt
Any help with this would be greatly appreciated.
I have implemented an AWK script for you! And it will still work for more than four level indexes! ;)
I will try to explain it a little with inline comments:
#!/usr/bin/awk -f
# Clears the "array" starting from "from"
function cleanArray(array,from){
for(w=from;w<=length(array);w++){
array[w]=0
}
}
# This is executed only one time at beginning.
BEGIN {
# The key of this array will be used to point to the "text index".
# I.E., an array with (1 2 2) means an index "1.2.2)"
array[1]=0
}
# This block will be executed for every line.
{
# Amount of "#" found.
amount=0
# In this line will be stored the result of the line.
line=""
# Let's save the entire line in a variable to modify it.
rest_of_line=$0
# While the line still starts with "#"...
while(rest_of_line ~ /^#/){
# We remove the first 2 characters.
rest_of_line=substr(rest_of_line, 3, length(rest_of_line))
# We found one "#", let's count it!
amount++
# The line still starts with "#"?
if(rest_of_line ~ /^#/){
# yes, it still starts.
# let's print the appropiate number and a ".".
line=line""array[amount]
line=line"."
}else{
# no, so we must add 1 to the old value of the array.
array[amount]++
# And we must clean the array if it stores more values
# starting from amount plus 1. We don't want to keep
# storing garbage numbers that may harm our accounting
# for the next line.
cleanArray(array,amount + 1)
# let's print the appropiate number and a ")".
line=line""array[amount]
line=line")"
}
}
# Great! We have the line with the appropiate indexes!
print line""rest_of_line
}
So, if you save it as script.awk, then you can execute it adding execution permission to the file:
chmod u+x script.awk
Finally, you can execute it:
./script.awk <path_to_number.txt>
As an example, if you save the script script.awk in the same directory where is located the file number.txt, then, change directory to that directory and execute:
./script.awk number.txt
So, if you have this number.txt
#) Title
#) Title
#) Title
#.#) Subtitle
#.#.#) Section
#.#) Subtitle
#) Title
#) Title
#.#) Subtitle
#.#.#) Section
#.#) Subtitle
#.#.#) Section
#.#.#.#) Subsection
#) Title
#) Title
#.#) Subtitle
#.#.#) Section
#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#.#) Subsection
#.#.#.#.#) Subsection
#.#.#.#) Subsection
#.#.#) Section
This will be the output (Note that the solution is not limited by the amount of "#"):
1) Title
2) Title
3) Title
3.1) Subtitle
3.1.1) Section
3.2) Subtitle
4) Title
5) Title
5.1) Subtitle
5.1.1) Section
5.2) Subtitle
5.2.1) Section
5.2.1.1) Subsection
6) Title
7) Title
7.1) Subtitle
7.1.1) Section
7.1.1.1) Subsection
7.1.1.1.1) Subsection
7.1.1.1.2) Subsection
7.1.1.1.3) Subsection
7.1.1.1.3.1) Subsection
7.1.1.1.4) Subsection
7.1.1.1.4.1) Subsection
7.1.1.1.4.2) Subsection
7.1.1.1.4.3) Subsection
7.1.1.1.4.4) Subsection
7.1.1.1.5) Subsection
7.1.1.2) Subsection
7.1.2) Section
I hope it helps you!
awk
to the rescue!
I'm not sure this is the optimal way of doing this but works...
awk 'BEGIN{d="."}
/#\.#\.#\.#/ {sub("#.#.#.#", i d a[i] d b[i d a[i]] d (++c[i d a[i] d b[i d a[i]]]))}
/#\.#\.#/ {sub("#.#.#" , i d a[i] d (++b[i d a[i]]))}
/#\.#/ {sub("#.#" , i d (++a[i]))}
/#/ {sub("#" , (++i))} 1'
UPDATE: The above is limited to only 4 levels. Here is a better one for unlimited number of levels
awk '{d=split($1,a,"#")-1; # find the depth
c[d]++; # increase counter for current
for(i=pd+1;i<=d;i++) c[i]=1; # reset when depth increases
for(i=1;i<=d;i++) {sub(/#/,c[i])}; # replace digits one by one
pd=d} 1' # set previous depth and print
perhaps reset steps can be combined with the main loop but I think clearer this way.
UPDATE 2:
I think with this logic, the following is the shortest possible.
$ awk '{d=split($1,_,"#")-1; # find the depth
c[d]++; # increment counter for current depth
for(i=1;i<=d;i++) # start replacement
{if(i>pd)c[i]=1; # reset the counters
sub(/#/,c[i]) # replace placeholders with counters
}
pd=d} 1' file # set the previous depth
or as a one-liner
$ awk '{d=split($1,_,"#")-1;c[d]++;for(i=1;i<=d;i++){if(i>pd)c[i]=1;sub(/#/,c[i])}pd=d}1'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With