Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use regex with cut at the command line?

Tags:

sed

cut

centos

I have some output like this from ls -alth:

drwxr-xr-x    5 root    admin   170B Aug  3  2016 ..
drwxr-xr-x    5 root    admin    70B Aug  3  2016 ..
drwxr-xr-x    5 root    admin     3B Aug  3  2016 ..
drwxr-xr-x    5 root    admin     9M Aug  3  2016 ..

Now, I want to parse out the 170B part, which is obviously the size in human readable format. I wanted to do this using cut or sed, because I don't want to use tools that are any more complicated/difficult to use than necessary.

Ideally I want it to be robust enough to handle the B, M or K suffix that comes with the size, and multiply accordingly by 1, 1000000 and 1000 accordingly. I haven't found a good way to do that, though.

I've tried a few things without really knowing the best approach:

ls -alth | cut -f 5 -d \s+

I was hoping that would work because I'd be able to just delimit it on one or more spaces.

But that doesn't work. How do I supply cut with a regex delimiter? or is there an easier way to extract only the size of the file from ls -alth?

I'm using CentOS6.4

like image 602
makansij Avatar asked Apr 09 '17 21:04

makansij


People also ask

What is cut in command line?

The cut command is a command-line utility that allows you to cut out sections of a specified file or piped data and print the result to standard output. The command cuts parts of a line by field, delimiter, byte position, and character.

Can we use regex in sed command?

Regular expressions are used by several different Unix commands, including ed, sed, awk, grep, and to a more limited extent, vi.

Can you use regex in Linux command line?

Regexps are most commonly used with the Linux commands:- grep, sed, tr, vi. The following are some basic regular expressions: Sr.


1 Answers

This answer tackles the question as asked, but consider George Vasiliou's helpful find solution as a potentially superior alternative.

  • cut only supports a single, literal character as the delimiter (-d), so it isn't the right tool to use.

  • For extracting tokens (fields) that are separated with a variable amount of whitespace per line, awk is the best tool, so the solution proposed by George Vasiliou is the simplest one:
    ls -alth | awk '{print $5}'
    extracts the 5th whitespace-separated field ($5), which is the size.

  • Rather than use -h first and then reconvert the human-readable suffixes (such as B, M, and G) back to the mere byte counts (incidentally, the multipliers must be multiples of 1024, not 1000), simply omit -h from the ls command, which outputs the raw byte counts by default:
    ls -alt | awk '{print $5}'

like image 126
mklement0 Avatar answered Nov 10 '22 13:11

mklement0