If I have a string variable who's value is "john is 17 years old"
how do I tokenize this using spaces as the delimeter? Would I use awk
?
In bash, a string can also be divided without using $IFS variable. The 'readarray' command with -d option is used to split the string data. The -d option is applied to define the separator character in the command like $IFS. Moreover, the bash loop is used to print the string in split form.
Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements called tokens. Tokens can be individual words, phrases or even whole sentences. In the process of tokenization, some characters like punctuation marks are discarded.
There is a built-in function named trim() for trimming in many standard programming languages. Bash has no built-in function to trim string data. But many options are available in bash to remove unwanted characters from string data, such as parameter expansion, sed, awk, xargs, etc.
$ string="john is 17 years old" $ tokens=( $string ) $ echo ${tokens[*]}
For other delimiters, like ';'
$ string="john;is;17;years;old" $ IFS=';' tokens=( $string ) $ echo ${tokens[*]}
Use the shell's automatic tokenization of unquoted variables:
$ string="john is 17 years old" $ for word in $string; do echo "$word"; done john is 17 years old
If you want to change the delimiter you can set the $IFS
variable, which stands for internal field separator. The default value of $IFS
is " \t\n"
(space, tab, newline).
$ string="john_is_17_years_old" $ (IFS='_'; for word in $string; do echo "$word"; done) john is 17 years old
(Note that in this second example I added parentheses around the second line. This creates a sub-shell so that the change to $IFS
doesn't persist. You generally don't want to permanently change $IFS
as it can wreak havoc on unsuspecting shell commands.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With