Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to process a YAML stream in Python

I have a command line app the continuously outputs YAML data in the form:

- col0: datum0
  col1: datum1
  col2: datum2
- col0: datum0
  col1: datum1
  col2: datum2
...

It does this for all of eternity. I would like to write a Python script that continuously reads each of these records.

The PyYAML library seems best at taking fully loaded strings and interpreting those as a complete YAML document. Is there a way to put PyYAML into a "streaming" mode?

Or is my only option to chunk the data myself and feed it bit by bit into PyYAML?

like image 794
Frank Krueger Avatar asked Jan 09 '09 18:01

Frank Krueger


1 Answers

Here is what I've ended up using since there does not seem to be a built-in method for accomplishing what I want. This function should be generic enough that it can read in a stream of YAML and return top-level objects as they are encountered.

def streamInYAML(stream):
    y = stream.readline()
    cont = 1
    while cont:
        l = stream.readline()
        if len(l) == 0:
            cont = 0
        else:
            if l.startswith(' '):
                y = y + l
            else:
                yield yaml.load(y)
                y = l

Can anyone do better?

like image 175
Frank Krueger Avatar answered Oct 02 '22 08:10

Frank Krueger