Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Substring in PowerShell to truncate string length

Is it possible in PowerShell, to truncate a string, (using SubString()?), to a given maximum number of characters, even if the original string is already shorter?

For example:

foreach ($str in "hello", "good morning", "hi") { $str.subString(0, 4) }

The truncation is working for hello and good morning, but I get an error for hi.

I would like the following result:

hell
good
hi
like image 229
Jérôme Avatar asked Jan 14 '15 13:01

Jérôme


5 Answers

You need to evaluate the current item and get the length of it. If the length is less than 4 then use that in the substring function.

foreach ($str in "hello", "good morning", "hi") {
    $str.subString(0, [System.Math]::Min(4, $str.Length)) 
}
like image 110
Eduard Uta Avatar answered Oct 30 '22 00:10

Eduard Uta


Or you could just keep it simple, using PowerShell's alternative to a ternary operator:

foreach ($str in "hello", "good morning", "hi") {
  $(if ($str.length -gt 4) { $str.substring(0, 4) } else { $str })
}

While all the other answers are "correct", their efficiencies go from sub-optimal to potentially horrendous. The following is not a critique of the other answers, but it is intended as an instructive comparison of their underlying operation. After all, scripting is more about getting it running soon than getting it running fast.

In order:

  1.  

    foreach ($str in "hello", "good morning", "hi") {
        $str.subString(0, [System.Math]::Min(4, $str.Length))
    }
    

    This is basically the same as my offering except that instead of just returning $str when it is too short, we call substring and tell it to return the whole string. Hence, sub-optimal. It is still doing the if..then..else but just inside Min, vis.

    if (4 -lt $str.length) {4} else {$str.length}
    
  2.  

    foreach ($str in "hello", "good morning", "hi") { $str -replace '(.{4}).+','$1' }
    

    Using regular expression matching to grab the first four characters and then replace the whole string with them means that the entire (possibly very long) string must be scanned by the matching engine of unknown complexity/efficiency.

    While a person can see that the '.+' is simply to match the entire remainder of the string, the matching engine could be building up a large list of backtracking alternatives since the pattern is not anchored (no ^ at the begining). The (not described) clever bit here is that if the string is less than five characters (four times . followed by one or more .) then the whole match fails and replace returns $str unaltered.

  3.  

    foreach ($str in "hello", "good morning", "hi") {
      try {
        $str.subString(0, 4)
      }
      catch [ArgumentOutOfRangeException] {
        $str
      }
    }
    

    Deliberately throwing exceptions instead of programmatic boundary checking is an interesting solution, but who knows what is going on as the exception bubbles up from the try block to the catch. Probably not much in this simple case, but it would not be a recommended general practice except in situations where there are many possible sources of errors (making it cumbersome to check for all of them), but only a few responses.

Interestingly, an answer to a similar question elsewhere using -join and array slices (which don't cause errors on index out of range, just ignore the missing elements):

$str[0..3] -join ""   # Infix

(or more simply)

-join $str[0..3]      # Prefix

could be the most efficient (with appropriate optimisation) given the strong similarity between the storage of string and char[]. Optimisation would be required since, by default, $str[0..3] is an object[], each element being a single char, and so bears little resemblance to a string (in memory). Giving PowerShell a little hint could be useful,

-join [char[]]$str[0..3]

However, maybe just telling it what you actually want,

new-object string (,$str[0..3]) # Need $str[0..3] to be a member of an array of constructor arguments

thereby directly invoking

new String(char[])

is best.

like image 22
uberkluger Avatar answered Oct 30 '22 00:10

uberkluger


You could trap the exception:

foreach ($str in "hello", "good morning", "hi") { 
  try { 
    $str.subString(0, 4) 
  }
  catch [ArgumentOutOfRangeException] {
    $str
  }
}
like image 36
arco444 Avatar answered Oct 30 '22 01:10

arco444


More regex love, using lookbehind:

PS > 'hello','good morning','hi' -replace '(?<=(.{4})).+'
hell
good
hi
like image 33
Nicolas Melay Avatar answered Oct 30 '22 00:10

Nicolas Melay


I'm late to the party as always! I have used the PadRight string function to address such an issue. I cannot comment on its relative efficiency compared to the other suggestions:

foreach ($str in "hello", "good morning", "hi") { $str.PadRight(4, " ").SubString(0, 4) }
like image 38
ThePennyDrops Avatar answered Oct 30 '22 00:10

ThePennyDrops