Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How could I remove newlines from all quoted pieces of text in a file?

Tags:

python

bash

csv

I have exported a CSV file from a database. Certain fields are longer text chunks, and can contain newlines. What would be the simplest way of removing only newlines from this file that are inside double quotes, but preserving all others?

I don't care if it uses a Bash command line one liner or a simple script as long as it works.

For example,

"Value1", "Value2", "This is a longer piece
    of text with
    newlines in it.", "Value3"
"Value4", "Value5", "Another value", "value6"

The newlines inside of the longer piece of text should be removed, but not the newline separating the two rows.

like image 249
davidscolgan Avatar asked Dec 05 '22 19:12

davidscolgan


1 Answers

In Python:

import csv
with open("input.csv", newline="") as input, \
        open("output.csv", "w", newline="") as output:
    w = csv.writer(output)
    for record in csv.reader(input):
        w.writerow(tuple(s.remove("\n") for s in record))
like image 162
Sven Marnach Avatar answered Dec 08 '22 01:12

Sven Marnach