Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert CSV to YAML, with Unicode?

I'm trying to convert a CSV file, containing Unicode strings, to a YAML file using Python 3.4.

Currently, the YAML parser escapes my Unicode text into an ASCII string. I want the YAML parser to export the Unicode string as a Unicode string, without the escape characters. I'm misunderstanding something here, of course, and I'd appreciate any assistance.

Bonus points: how might this be done with Python 2.7?

CSV input

id, title_english, title_russian
1, A Title in English, Название на русском
2, Another Title, Другой Название

current YAML output

- id: 1
  title_english: A Title in English
  title_russian: "\u041D\u0430\u0437\u0432\u0430\u043D\u0438\u0435 \u043D\u0430\
    \ \u0440\u0443\u0441\u0441\u043A\u043E\u043C"
- id: 2
  title_english: Another Title
  title_russian: "\u0414\u0440\u0443\u0433\u043E\u0439 \u041D\u0430\u0437\u0432\u0430\
      \u043D\u0438\u0435"

desired YAML output

- id: 1
  title_english: A Title in English
  title_russian: Название на русском
- id: 2
  title_english: Another Title
  title_russian: Другой Название

Python conversion code

import csv
import yaml
in_file  = open('csv_file.csv', "r")
out_file = open('yaml_file.yaml', "w")
items = []

def convert_to_yaml(line, counter):
    item = {
        'id': counter,
        'title_english': line[0],
        'title_russian': line[1]
    }
    items.append(item)

try:
    reader = csv.reader(in_file)
    next(reader) # skip headers
    for counter, line in enumerate(reader):
        convert_to_yaml(line, counter)
    out_file.write( yaml.dump(items, default_flow_style=False) )

finally:
    in_file.close()
    out_file.close()

Thanks!

like image 859
aljabear Avatar asked Nov 12 '14 13:11

aljabear


People also ask

What is Yaml Safe_load?

Loading a YAML Document Safely Using safe_load() safe_load(stream) Parses the given and returns a Python object constructed from the first document in the stream. safe_load recognizes only standard YAML tags and cannot construct an arbitrary Python object.

How do I convert a CSV file to XML?

How to convert a CSV to a XML file? Choose the CSV file that you want to convert. Select XML as the the format you want to convert your CSV file to. Click "Convert" to convert your CSV file.


2 Answers

I ran into the same issue and this was how I was able to resolve it based on your example above

out_file.write(yaml.dump(items, default_flow_style=False,allow_unicode=True) )

including allow_unicode=True fixes the issue.

also specifically for python2 make use of safe_dump instead of dump to prevent the !!python/unicode displaying along with the unicode text.

out_file.write(yaml.safe_dump(items, default_row_style=False,allow_unicode=True)
like image 165
gbozee Avatar answered Oct 03 '22 12:10

gbozee


In Python 2.x, you should use a Unicode CSV reader as Python's CSV reader doesn't support that. You can use unicodecsv for this purpose.

In your current Python 3.x code you should explicitly pass the file encoding when opening it:

import csv
with open('some.csv', newline='', encoding='utf-8') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

It may be that your system is already doing the right thing but you're relying on defaults in that case.

Lastly, you need to make sure the YAML file is opened with the correct encoding: open("yaml_file.yaml", "w", encoding="utf-8"). And this encoding should be used later when reading the YAML file.

I'm not sure what the yaml library does when given Python objects but you also need to check that line[0] and line[1] are Unicode strings when you're setting them inside convert_to_yaml.

like image 23
Simeon Visser Avatar answered Oct 03 '22 11:10

Simeon Visser