Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find & replace line breaks between double quotes

I have a CSV file containing some line breaks in values. For example:

"Chiffre","Nom","Descriptif court","Tarifs en clair","Période en clair","Adresse 1","CP","Tel","Site","Facebook","Pictos","@Saveurs","@Famille plus","@Img","Accessible en fauteuil roulant en autonomie","Handicapes","Terrasse","Wifi","Chèque Vacances","Titre Restaurant"
6,"Le Chalet d'en Ô","Cuisine traditionnelle (foie gras, magret myrtilles, croustillant chocolat)
Spécialités savoyardes (fondue, tartiflette.. )
Garanti sans burgers.
Tout est fait maison.Cuisine traditionnelle (foie gras, magret myrtilles, croustillant chocolat)
Spécialités savoyardes (fondue, tartiflette.. )
Garanti sans burgers.
Tout est fait maison.","Menu adulte : de 20 à 30 €
Menu enfant : 10 €.

Suggestion du jour le midi en semaine : entrée,plat, dessert : 20€.Menu adulte : de 20 à 30 €
Menu enfant : 10 €.

Suggestion du jour le midi en semaine : entrée,plat, dessert : 20€.","Midi et soir du jeudi au samedi + midi le dimanche et jours fériés.
Juillet et août, midi et soir du mardi au samedi et midi uniquement dimanche et jours fériés.Midi et soir du jeudi au samedi + midi le dimanche et jours fériés.
Juillet et août, midi et soir du mardi au samedi et midi uniquement dimanche et jours fériés.","Le Cropt - Route de Serraval","74230 Les Clefs","+33 4 50 02 09 00","www.lechaletdeno.com",,"A R J X x",,,,,"A","R","J","X","x"
7,"La Cabane - Pisciculture de Montremont","Bar - restaurant au bord de la rivière dans un cadre champêtre avec sa spécialité la truite.","Menu adulte : de 26 à 35 €.","Juin et septembre : du mercredi au dimanche.
Juillet et août : tous les jours à midi + mercredi à samedi le soir.Juin et septembre : du mercredi au dimanche.
Juillet et août : tous les jours à midi + mercredi à samedi le soir.","Pisciculture de Montremont - 100 impasse des Pesetz","74230 Thônes","+33 4 50 02 00 85","pisciculture-montremont.fr",,"A R  X x",,,,,"A","R",,"X","x"

Just need to replace any line break with a space (or anything).

Tried a lot of existing solutions, but using \\n as search term doesn't seem to work, among other issues.

Any idea? Thanks.

Sublime Text would suffice but if another tool is easier, no problem.

Edit: \n works fine if I have only this as the search term. But I need to find them only between ". The best I got so far is matching all text between " with: (?<=")[^"]*.

like image 707
Nico Prat Avatar asked Jan 30 '23 01:01

Nico Prat


2 Answers

You cannot do that with a text editor in a safe way because the starting and ending delimiters are identical, and any lookaround-based solution, or a \G based solution won't work well.

Use some programming language that supports a callback method/function as the replacement argument in a regex replace method/function, match any double quoted substring with

"[^"]*(?:""[^"]*)*"

See the regex demo. A simplified version if you do not need to care about literal double quotes is "[^"]+".

Details

  • " - a double quote
  • [^"]* - 0+ chars other than double quotes
  • (?: - a grouping construct that is repeated 0+ times
    • "" - 2 double quotes
    • [^"]* - 0+ chars other than double quotes
  • )* -
  • " - a double quote.

This regex can be used as follows: read in a file and use the following solutions:

  • python: re.sub(r'"[^"]*(?:""[^"]*)*"', lambda m: m.group(0).replace("\n", ""), s)
  • javascript: s = s.replace(/"[^"]*(?:""[^"]*)*"/g, function(m) { return m.replace(/\n/g, ''); })
  • php: $s = preg_replace_callback('~"[^"]*(?:""[^"]*)*"~', function($m) { return str_replace("\n", "", $m[0]); }, $s)
  • c#: s = Regex.Replace(s, "\"[^\"]*(?:\"\"[^\"]*)*\"", m => m.Value.Replace("\n", ""))

If you have \r and \n to remove, as a second step, you may use .replace(/[\r\n]+/g, '') in JS, preg_replace('~\R+~', '', $m[0]) in PHP, m.Value.Replace("\r", "").Replace("\n","") in C# and m.group(0).replace("\n", "").replace("\n", "") in Python.

In C#, the full solution will look like

using System.IO;
...
var file = "path_to_file";
var path_to_save = "path_to_save";
var contents = string.Empty;
using (var sr = new StreamReader(file, true)) // true for a Unicode encoding
{
    contents = sr.ReadToEnd();
}
contents = Regex.Replace(contents, "\"[^\"]*(?:\"\"[^\"]*)*\"", 
    m => m.Value.Replace("\n", "").Replace("\r", ""));
using (var sw = new StreamWriter(path_to_save, false, Encoding.UTF8))
{
    sw.Write(contents);
    sw.Close();
}
like image 159
Wiktor Stribiżew Avatar answered Jan 31 '23 14:01

Wiktor Stribiżew


Use Notepad++ regex Find-and-Replace:

Find what:

(,"[^"]*?)[\r\n]+

Replace with:

$1 

(There is a single space after $1)

Repeatedly click "Replace All" until no more matches are found.

like image 44
Josh Withee Avatar answered Jan 31 '23 16:01

Josh Withee