Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract part of a string using bash/cut/split

Tags:

string

bash

People also ask

How do I trim a string in bash?

Use sed 's/^ *//g', to remove the leading white spaces. There is another way to remove whitespaces using `sed` command. The following commands removed the spaces from the variable, $Var by using `sed` command and [[:space:]].

How do you slice a string in Linux?

The cut command in UNIX is a command for cutting out the sections from each line of files and writing the result to standard output. It can be used to cut parts of a line by byte position, character and field. Basically the cut command slices a line and extracts the text.

How do you get extract a word from a line in bash shell script?

If "name=foo" is always in the same position in the line you could simply use: awk '{print($3)}' , that will extract the third word in your line which is name=foo in your case. He told "also other parameters may or may not be there" hence you may not use a positional logic.


To extract joebloggs from this string in bash using parameter expansion without any extra processes...

MYVAR="/var/cpanel/users/joebloggs:DNS9=domain.com" 

NAME=${MYVAR%:*}  # retain the part before the colon
NAME=${NAME##*/}  # retain the part after the last slash
echo $NAME

Doesn't depend on joebloggs being at a particular depth in the path.


Summary

An overview of a few parameter expansion modes, for reference...

${MYVAR#pattern}     # delete shortest match of pattern from the beginning
${MYVAR##pattern}    # delete longest match of pattern from the beginning
${MYVAR%pattern}     # delete shortest match of pattern from the end
${MYVAR%%pattern}    # delete longest match of pattern from the end

So # means match from the beginning (think of a comment line) and % means from the end. One instance means shortest and two instances means longest.

You can get substrings based on position using numbers:

${MYVAR:3}   # Remove the first three chars (leaving 4..end)
${MYVAR::3}  # Return the first three characters
${MYVAR:3:5} # The next five characters after removing the first 3 (chars 4-9)

You can also replace particular strings or patterns using:

${MYVAR/search/replace}

The pattern is in the same format as file-name matching, so * (any characters) is common, often followed by a particular symbol like / or .

Examples:

Given a variable like

MYVAR="users/joebloggs/domain.com" 

Remove the path leaving file name (all characters up to a slash):

echo ${MYVAR##*/}
domain.com

Remove the file name, leaving the path (delete shortest match after last /):

echo ${MYVAR%/*}
users/joebloggs

Get just the file extension (remove all before last period):

echo ${MYVAR##*.}
com

NOTE: To do two operations, you can't combine them, but have to assign to an intermediate variable. So to get the file name without path or extension:

NAME=${MYVAR##*/}      # remove part before last slash
echo ${NAME%.*}        # from the new var remove the part after the last period
domain

Define a function like this:

getUserName() {
    echo $1 | cut -d : -f 1 | xargs basename
}

And pass the string as a parameter:

userName=$(getUserName "/var/cpanel/users/joebloggs:DNS9=domain.com")
echo $userName

What about sed? That will work in a single command:

sed 's#.*/\([^:]*\).*#\1#' <<<$string
  • The # are being used for regex dividers instead of / since the string has / in it.
  • .*/ grabs the string up to the last backslash.
  • \( .. \) marks a capture group. This is \([^:]*\).
    • The [^:] says any character _except a colon, and the * means zero or more.
  • .* means the rest of the line.
  • \1 means substitute what was found in the first (and only) capture group. This is the name.

Here's the breakdown matching the string with the regular expression:

        /var/cpanel/users/           joebloggs  :DNS9=domain.com joebloggs
sed 's#.*/                          \([^:]*\)   .*              #\1       #'

Using a single Awk:

... | awk -F '[/:]' '{print $5}'

That is, using as field separator either / or :, the username is always in field 5.

To store it in a variable:

username=$(... | awk -F '[/:]' '{print $5}')

A more flexible implementation with sed that doesn't require username to be field 5:

... | sed -e s/:.*// -e s?.*/??

That is, delete everything from : and beyond, and then delete everything up until the last /. sed is probably faster too than awk, so this alternative is definitely better.


Using a single sed

echo "/var/cpanel/users/joebloggs:DNS9=domain.com" | sed 's/.*\/\(.*\):.*/\1/'