Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split text file into array based on an empty line or any non used character

I have a text file which contains text lines separated by an empty line of text. I want to push the content of that file into an array, and use the empty line as a separator. I tried IFS="\n" (or "\r\n" etc..) but couldn't get it to work so instead I thought I would replace any empty line by a character that isn't in the file, so I picked up the spanish inverted question mark (\xBF)

sed 's/^$/'$(echo -e "\xBF")'/'))

So that works, I have a character that I'll use to slice my file and put it into an array.(Bit of a random trick but hey that's just one way of doing it ..)

Now I need to change $IFS so it will use the inverted question mark to slice up the data for the array.

If I type

IFS=$(echo -e "\xBF")

in the command line it works just fine

 echo "$IFS"
¿

But if I type that command with a trailing read -a then it does nothing :

[user@machine ~]$ IFS=$(echo -e "\xBF") read -a array <<< "$var"
[user@machine ~]$ echo "$IFS"
[user@machine ~]$

So that's weird because $var has a value.

Even more surprising, when I verify the value of IFS right after I get :

[user@machine ~]$ echo -n "$IFS" | od -abc
0000000  sp  ht  nl
    040 011 012
         \t  \n
0000003
[user@machine ~]$ 

Which is the default value for IFS.

I am pretty sure one can use any character for IFS, no ?

Alternatively, if you have any trick up your sleeve to split a file in an array with a split based on empty lines I am interested ! (still I'd like to get to the bottom of this for comprehension's sake).

Thanks very much, and have a good week-end :)

like image 892
Bluz Avatar asked Aug 30 '13 18:08

Bluz


2 Answers

This script should do what you want:

#!/bin/bash

i=1
s=1
declare -a arr
while read -r line 
do
    # If we find an empty line, then we increase the counter (i), 
    # set the flag (s) to one, and skip to the next line
    [[ $line == "" ]] && ((i++)) && s=1 && continue 

    # If the flag (s) is zero, then we are not in a new line of the block
    # so we set the value of the array to be the previous value concatenated
    # with the current line
    [[ $s == 0 ]] && arr[$i]="${arr[$i]}
$line" || { 
            # Otherwise we are in the first line of the block, so we set the value
            # of the array to the current line, and then we reset the flag (s) to zero 
            arr[$i]="$line"
            s=0; 
    }
done < file

for i in "${arr[@]}"
do
   echo "================"
   echo "$i"
done 

Test file:

$ cat file
asdf dsf s dfsdaf s
sadfds fdsa fads f dsaf as

fdsafds f dsf ds afd f saf dsf
sdfsfs dfadsfsaf

sdfsafds fdsafads fd saf adsfas
sdfdsfds fdsfd saf dsa fds fads f

Output:

================
asdf dsf s dfsdaf s
sadfds fdsa fads f dsaf as
================
fdsafds f dsf ds afd f saf dsf
sdfsfs dfadsfsaf
================
sdfsafds fdsafads fd saf adsfas
sdfdsfds fdsfd saf dsa fds fads f

Update:

In order to ignore lines beginning with #, you can add this line after the do:

[[ $line =~ ^# ]] && continue
like image 182
user000001 Avatar answered Sep 24 '22 02:09

user000001


First of all, by design, variables set with var=foo command are only made available to command and won't be set for the rest of the script.

As for your problem, read reads a record until the first delimiter (-d, default: line feed), and then splits that up into fields by $IFS.

To loop over your items, you can use

sed -e 's/^$/\xBF/' | while read -d $'\xBF' var
do
    printf "Value: %s\n-----\n" "$var"
done

To read them all into an array from a string, you can read up until some character you hopefully don't have, like a NUL byte:

IFS=$'\xBF' read -d '' -a array <<< "$var"
like image 40
that other guy Avatar answered Sep 21 '22 02:09

that other guy