I want to split each line of a pipe on spaces, and then print each token on its own line.
I realise that I can get this result using:
(cat someFileInsteadOfAPipe).split(" ")
But I want more flexibility. I want to be able to do just about anything with each token. (I used to use AWK on Unix, and I'm trying to get the same functionality.)
I currently have:
echo "Once upon a time there were three little pigs" | %{$data = $_.split(" "); Write-Output "$($data[0]) and whatever I want to output with it"}
Which, obviously, only prints the first token. Is there a way for me to for-each over the tokens, printing each in turn?
Also, the %{$data = $_.split(" "); Write-Output "$($data[0])"}
part I got from a blog, and I really don't understand what I'm doing or how the syntax works.
I want to google for it, but I don't know what to call it. Please help me out with a word or two to Google, or a link explaining to me what the %
and all the $
symbols do, as well as the significance of the opening and closing brackets.
I realise I can't actually use (cat someFileInsteadOfAPipe).split(" ")
, since the file (or preferable incoming pipe) contains more than one line.
Regarding some of the answers:
If you are using Select-String
to filter the output before tokenizing, you need to keep in mind that the output of the Select-String
command is not a collection of strings, but a collection of MatchInfo
objects. To get to the string you want to split, you need to access the Line
property of the MatchInfo
object, like so:
cat someFile | Select-String "keywordFoo" | %{$_.Line.Split(" ")}
Split() function. The . Split() function splits the input string into the multiple substrings based on the delimiters, and it returns the array, and the array contains each element of the input string. By default, the function splits the string based on the whitespace characters like space, tabs, and line-breaks.
UNARY and BINARY SPLIT OPERATORS Use one of the following patterns to split more than one string: Use the binary split operator (<string[]> -split <delimiter>) Enclose all the strings in parentheses. Store the strings in a variable then submit the variable to the split operator.
Description. In JavaScript, split() is a string method that is used to split a string into an array of strings using a specified delimiter. Because the split() method is a method of the String object, it must be invoked through a particular instance of the String class.
"Once upon a time there were three little pigs".Split(" ") | ForEach { "$_ is a token" }
The key is $_
, which stands for the current variable in the pipeline.
About the code you found online:
%
is an alias for ForEach-Object
. Anything enclosed inside the brackets is run once for each object it receives. In this case, it's only running once, because you're sending it a single string.
$_.Split(" ")
is taking the current variable and splitting it on spaces. The current variable will be whatever is currently being looped over by ForEach
.
To complement Justus Thane's helpful answer:
As Joey notes in a comment, PowerShell has a powerful, regex-based -split
operator.
-split '...'
), -split
behaves like awk
's default field splitting, which means that: In PowerShell v4+ an expression-based - and therefore faster - alternative to the ForEach-Object
cmdlet became available: the .ForEach()
array (collection) method, as described in this blog post (alongside the .Where()
method, a more powerful, expression-based alternative to Where-Object
).
Here's a solution based on these features:
PS> (-split ' One for the money ').ForEach({ "token: [$_]" }) token: [One] token: [for] token: [the] token: [money]
Note that the leading and trailing whitespace was ignored, and that the multiple spaces between One
and for
were treated as a single separator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With