I have some output like this from <code>ls -alth</code>: <pre class="prettyprint"><code>drwxr-xr-x 5 root admin 170B Aug 3 2016 .. drwxr-xr-x 5 root admin 70B Aug 3 2016 .. drwxr-xr-x 5 root admin 3B Aug 3 2016 .. drwxr-xr-x 5 root admin 9M Aug 3 2016 .. </code></pre> Now, I want to parse out the <code>170B</code> part, which is obviously the size in human readable format. I wanted to do this using <code>cut</code> or <code>sed</code>, because I don't want to use tools that are any more complicated/difficult to use than necessary. Ideally I want it to be robust enough to handle the <code>B</code>, <code>M</code> or <code>K</code> suffix that comes with the size, and multiply accordingly by <code>1</code>, <code>1000000</code> and <code>1000</code> accordingly. I haven't found a good way to do that, though. I've tried a few things without really knowing the best approach: <pre class="prettyprint"><code>ls -alth | cut -f 5 -d \s+ </code></pre> I was hoping that would work because I'd be able to just delimit it on one or more spaces. But that doesn't work. How do I supply <code>cut</code> with a regex delimiter? or is there an easier way to extract only the size of the file from <code>ls -alth</code>? I'm using CentOS6.4

This answer tackles the question as asked, but consider George Vasiliou's helpful <code>find</code> solution as a potentially superior alternative. <ul> <li><code>cut</code> only supports a single, literal character as the delimiter (<code>-d</code>), so it isn't the right tool to use.</li> <li>For extracting tokens (fields) that are separated with a variable amount of whitespace per line, <code>awk</code> is the best tool, so the solution proposed by George Vasiliou is the simplest one: <code>ls -alth | awk '{print $5}'</code> extracts the 5th whitespace-separated field (<code>$5</code>), which is the size.</li> <li>Rather than use <code>-h</code> first and then reconvert the human-readable suffixes (such as <code>B</code>, <code>M</code>, and <code>G</code>) back to the mere byte counts (incidentally, the multipliers must be multiples of <code>1024</code>, not <code>1000</code>), simply omit <code>-h</code> from the <code>ls</code> command, which outputs the raw byte counts by default: <code>ls -alt | awk '{print $5}'</code> </li> </ul>

How to use regex with cut at the command line?

Tags:

sed

cut

centos

I have some output like this from ls -alth:

drwxr-xr-x    5 root    admin   170B Aug  3  2016 ..
drwxr-xr-x    5 root    admin    70B Aug  3  2016 ..
drwxr-xr-x    5 root    admin     3B Aug  3  2016 ..
drwxr-xr-x    5 root    admin     9M Aug  3  2016 ..

Now, I want to parse out the 170B part, which is obviously the size in human readable format. I wanted to do this using cut or sed, because I don't want to use tools that are any more complicated/difficult to use than necessary.

Ideally I want it to be robust enough to handle the B, M or K suffix that comes with the size, and multiply accordingly by 1, 1000000 and 1000 accordingly. I haven't found a good way to do that, though.

I've tried a few things without really knowing the best approach:

ls -alth | cut -f 5 -d \s+

I was hoping that would work because I'd be able to just delimit it on one or more spaces.

But that doesn't work. How do I supply cut with a regex delimiter? or is there an easier way to extract only the size of the file from ls -alth?

I'm using CentOS6.4

602

asked Apr 09 '17 21:04

makansij

1 Answers

^{This answer tackles the question as asked, but consider George Vasiliou's helpful find solution as a potentially superior alternative.}

cut only supports a single, literal character as the delimiter (-d), so it isn't the right tool to use.
For extracting tokens (fields) that are separated with a variable amount of whitespace per line, awk is the best tool, so the solution proposed by George Vasiliou is the simplest one:
ls -alth | awk '{print $5}'
extracts the 5th whitespace-separated field ($5), which is the size.
Rather than use -h first and then reconvert the human-readable suffixes (such as B, M, and G) back to the mere byte counts (incidentally, the multipliers must be multiples of 1024, not 1000), simply omit -h from the ls command, which outputs the raw byte counts by default:
ls -alt | awk '{print $5}'

126

answered Nov 10 '22 13:11

mklement0

Related questions
                            
                                easy way to change the uniq -c output?
                            
                                Creating multiple csv files from data within a csv file
                            
                                change last two digits of a number using sed
                            
                                trim big log file
                            
                                special characters in sed
                            
                                sed — joining a range of selected lines
                            
                                replace space only between parentheses
                            
                                Escaping plus signs doesn't work
                            
                                grep + grep + sed = sed: no input files
                            
                                Jenkins pipeline : templating a file with variables
                            
                                Sed with inner command and with regex group reference
                            
                                sed and awk: how to replace a section of file to another content?
                            
                                Replacing all images in a CSS file with base64 encoded strings from the command line
                            
                                sed: simultanous in-place replacement, and printout of changed lines?
                            
                                Remove duplicate words in a line with sed
                            
                                When should i use sed and when should i use awk [closed]
                            
                                sed - replacing text with colon
                            
                                How to use sed to replace a pattern in a file only in lines that contain another pattern
                            
                                Run a shell command from a variable in a shell script
                            
                                Adding quotations at beginning and end of a line using sed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With