I have text file(s) that look like below:
1.
SometextSometextSometextSometext
2.
SometextSometextSometextSometext
3.
SometextSometextSometextSometext
4.
SometextSometextSometextSometext
I need to remove the carriage return between the number and the text below it and there needs to be a space between the number(.) and moved text as below:
Right now I'm trying:
$x = Get-Content *FILENAME*
$x | Foreach-Object {$_ | select-string "^\d{1,2}\.\s+" }
(Note: I can match on select-string "^\d{1,2}.\s+" but after than don't know how to remove the line break or join the files)
Final Outcome I'm trying for:
$x = Get-Content $filename -Raw
$x -replace '(\d{1,2}\.)\s*\r?\n(.+?)(\r?\n|$){2,}','$1 $2$3'
Get-Content
with the -Raw
parameter returns the file as a single string instead of individual lines. In this case, since you're working with line breaks, it's easier to see it all as one string..
, and capture this in group 1..
after it, the line you want, and a single line ending if it existed.Since the pipeline only works one line at a time, it's probably easiest to save the number in a buffer, and output it when you get to the next line:
$x | Foreach-Object {if($_ -match "^\d{1,2}\.\s+"){$num = $_}else{$num+$_;$num="";} }
I'll try something shorter:
Get-Content $my_file -ReadCount 3 | ForEach{$_ -Join " "}
That splits it into groups of lines, and joins them with a space. Not sure why it's 3 and not 2 to be honest, I just know it works when I tested it against the sample you provided. Below is my test (I saved that to a text file at C:\Temp\Test.txt):
PS C:\> gc C:\temp\test.txt -ReadCount 3 | %{$_ -join " "}
1. SometextSometextSometextSometext
2. SometextSometextSometextSometext
3. SometextSometextSometextSometext
4. SometextSometextSometextSometext
Edit: Oh, duh, it's 3 not 2 because there's blank lines in the text file. So I suppose technically this is adding a space at the end of each piece of text. That could be avoided by filtering for blank lines:
Get-Content $my_file -ReadCount 3 | ForEach{($_ | Where{![String]::IsNullorEmpty($_)}) -Join " "}
Here's a solution. It uses the buffering approach, but instead of +=
to concatenate onto a string, it uses a StringBuilder, which can perform better. (See this blog post)
$source = (
"1.",
"SometextSometextSometextSometext",
"",
"2.",
"SometextSometextSometextSometext",
"3.",
"",
"SometextSometextSometextSometext"
);
$stringBuilder = New-Object System.Text.StringBuilder
$source | % {
if ($_ -match [regex]'^\d+\.') {
$null = $stringBuilder.Append("{0} " -f $_)
}
if ($_ -match [regex]'^[A-Za-z]') {
$null = $stringBuilder.Append($_)
$stringBuilder.ToString();
$stringBuilder.Length = 0;
}
}
This outputs:
- SometextSometextSometextSometext
- SometextSometextSometextSometext
- SometextSometextSometextSometext
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With