Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Powershell script to remove double quotes from CSV unless comma exists inside double quotes

I have a .csv in the following file format:

In: "bob","1234 Main St, New York, NY","cool guy"

I am looking to remove double quotes that don't have a comma inside:

Out: bob,"1234 Main St, New York, Ny",cool guy

Is there a way to do this in Powershell?

I have checked:

  1. How to remove double quotes on specific column from CSV file using Powershell script
  2. http://blogs.technet.com/b/heyscriptingguy/archive/2011/11/02/remove-unwanted-quotation-marks-from-csv-files-by-using-powershell.aspx
  3. https://social.technet.microsoft.com/Forums/windowsserver/en-US/f6b610b6-bfb2-4140-9529-e61ad30b8927/how-to-export-csv-without-doublequote?forum=winserverpowershell
like image 318
jgaw Avatar asked May 14 '15 20:05

jgaw


People also ask

How do I remove double quotes from a CSV file in PowerShell?

csv file. Use the Foreach-Object cmdlet (% is an alias) to read each line as it comes from the file. Inside the script block for the Foreach-Object command, use the $_ automatic variable to reference the current line and the replace operator to replace a quotation mark with nothing.

How do you escape a double quote in CSV?

You can import double quotation marks using CSV files and import maps by escaping the double quotation marks. To escape the double quotation marks, enclose them within another double quotation mark.

How do you handle double quotes and commas in a CSV file?

Since CSV files use the comma character "," to separate columns, values that contain commas must be handled as a special case. These fields are wrapped within double quotation marks. The first double quote signifies the beginning of the column data, and the last double quote marks the end.

Why there is double quotes in CSV?

If the text within a field contains quoted text and a comma, then it starts to get ugly as double quotes are now needed to prevent confusion as to what each quote character means.


2 Answers

Adapting the code from "How to remove double quotes on specific column from CSV file using Powershell script":

$csv = 'C:\path\to\your.csv'
(Get-Content $csv) -replace '(?m)"([^,]*?)"(?=,|$)', '$1' |
    Set-Content $csv

The regex (?m)"([^,]*?)"(?=,|$) is matching any " + 0 or more non-commas + " before a comma or end of line (achieved with a positive look-ahead and a multiline option (?m) that forces $ to match a newline, not just the end of string).

See regex demo

like image 196
Wiktor Stribiżew Avatar answered Sep 28 '22 08:09

Wiktor Stribiżew


I don't know exactly what the rest of your script looks like. Try something along these lines though

(("bob","1234 Main St, New York, NY","cool guy") -split '"' | 
  ForEach-Object {IF ($_ -match ",") {'"' + $_ + '"' } ELSE {$_}}) -join ","
like image 30
markg Avatar answered Sep 28 '22 08:09

markg