Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Powershell join lines

Tags:

powershell

I have text file(s) that look like below:

1.
SometextSometextSometextSometext

2.
SometextSometextSometextSometext

3.
SometextSometextSometextSometext

4.
SometextSometextSometextSometext

I need to remove the carriage return between the number and the text below it and there needs to be a space between the number(.) and moved text as below:

Right now I'm trying:

$x =  Get-Content *FILENAME*
$x |  Foreach-Object {$_ | select-string "^\d{1,2}\.\s+" }

(Note: I can match on select-string "^\d{1,2}.\s+" but after than don't know how to remove the line break or join the files)

Final Outcome I'm trying for:

  1. SometextSometextSometextSometext
  2. SometextSometextSometextSometext
  3. SometextSometextSometextSometext
  4. SometextSometextSometextSometext
like image 617
niz100 Avatar asked Aug 12 '14 22:08

niz100


4 Answers

$x = Get-Content $filename -Raw
$x -replace '(\d{1,2}\.)\s*\r?\n(.+?)(\r?\n|$){2,}','$1 $2$3'

How this works:

  1. Calling Get-Content with the -Raw parameter returns the file as a single string instead of individual lines. In this case, since you're working with line breaks, it's easier to see it all as one string.
  2. The regular expression works as follows:
    1. Find 1 or 2 digits followed by a ., and capture this in group 1.
    2. Continue matching on any amount of whitespace, followed by an optional carriage return, followed by a single linefeed (this should work for windows/non-windows line endings).
    3. Match 1 or more characters (non-greedy) and capture in group 2.
    4. Match or CRLF or LF combination or match the end of the string, 2 or more times, but only capture the first instance in group 3.
  3. So now we have 3 captured groups: the number and the . after it, the line you want, and a single line ending if it existed.
  4. We replace the entire thing we matched with group 1, a single space, then group 2 and group 3.
like image 107
briantist Avatar answered Oct 11 '22 13:10

briantist


Since the pipeline only works one line at a time, it's probably easiest to save the number in a buffer, and output it when you get to the next line:

$x | Foreach-Object {if($_ -match "^\d{1,2}\.\s+"){$num = $_}else{$num+$_;$num="";} }
like image 40
zdan Avatar answered Oct 11 '22 14:10

zdan


I'll try something shorter:

Get-Content $my_file -ReadCount 3 | ForEach{$_ -Join " "}

That splits it into groups of lines, and joins them with a space. Not sure why it's 3 and not 2 to be honest, I just know it works when I tested it against the sample you provided. Below is my test (I saved that to a text file at C:\Temp\Test.txt):

PS C:\> gc C:\temp\test.txt -ReadCount 3 | %{$_  -join " "}
1. SometextSometextSometextSometext 
2. SometextSometextSometextSometext 
3. SometextSometextSometextSometext 
4. SometextSometextSometextSometext

Edit: Oh, duh, it's 3 not 2 because there's blank lines in the text file. So I suppose technically this is adding a space at the end of each piece of text. That could be avoided by filtering for blank lines:

Get-Content $my_file -ReadCount 3 | ForEach{($_ | Where{![String]::IsNullorEmpty($_)}) -Join " "}
like image 44
TheMadTechnician Avatar answered Oct 11 '22 15:10

TheMadTechnician


Here's a solution. It uses the buffering approach, but instead of += to concatenate onto a string, it uses a StringBuilder, which can perform better. (See this blog post)

$source = (
"1.",
"SometextSometextSometextSometext",
"",
"2.",
"SometextSometextSometextSometext",
"3.",
"",
"SometextSometextSometextSometext"
);


$stringBuilder = New-Object System.Text.StringBuilder

$source | % {
    if ($_ -match [regex]'^\d+\.') {
        $null = $stringBuilder.Append("{0} " -f $_)
    }
    if ($_ -match [regex]'^[A-Za-z]') {
            $null = $stringBuilder.Append($_)
            $stringBuilder.ToString();
            $stringBuilder.Length = 0;
    }
} 

This outputs:

  1. SometextSometextSometextSometext
  2. SometextSometextSometextSometext
  3. SometextSometextSometextSometext
like image 24
Andrew Shepherd Avatar answered Oct 11 '22 15:10

Andrew Shepherd