Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I stream JSON from a file?

Tags:

json

stream

perl

I will have a possibly very large JSON file and I want to stream from it instead of load it all into memory. Based on the following statement (I added the emphasis) from JSON::XS, I believe it won't suit my needs. Is there a Perl 5 JSON module that will stream the results from the disk?

In some cases, there is the need for incremental parsing of JSON texts. While this module always has to keep both JSON text and resulting Perl data structure in memory at one time, it does allow you to parse a JSON stream incrementally. It does so by accumulating text until it has a full JSON object, which it then can decode. This process is similar to using decode_prefix to see if a full JSON object is available, but is much more efficient (and can be implemented with a minimum of method calls).

To clarify, the JSON will contain an array of objects. I want to read one object at a time from the file.

like image 989
Chas. Owens Avatar asked Sep 17 '12 13:09

Chas. Owens


People also ask

Can JSON be streamed?

JSON streaming comprises communications protocols to delimit JSON objects built upon lower-level stream-oriented protocols (such as TCP), that ensures individual JSON objects are recognized, when the server and clients use the same one (e.g. implicitly coded in).

How do I access a JSON file?

JSON files are human-readable means the user can read them easily. These files can be opened in any simple text editor like Notepad, which is easy to use. Almost every programming language supports JSON format because they have libraries and functions to read/write JSON structures.

How can we extract JSON data?

To extract the name and projects properties from the JSON string, use the json_extract function as in the following example. The json_extract function takes the column containing the JSON string, and searches it using a JSONPath -like expression with the dot . notation. JSONPath performs a simple tree traversal.

How can I get JSON data from URL?

To read a JSON response there is a widely used library called urllib in python. This library helps to open the URL and read the JSON response from the web. To use this library in python and fetch JSON response we have to import the json and urllib in our code, The json. loads() method returns JSON object.


2 Answers

In terms of ease of use and speed, JSON::SL seems to be the winner:

#!/usr/bin/perl

use strict;
use warnings;

use JSON::SL;

my $p = JSON::SL->new;

#look for everthing past the first level (i.e. everything in the array)
$p->set_jsonpointer(["/^"]);

local $/ = \5; #read only 5 bytes at a time
while (my $buf = <DATA>) {
    $p->feed($buf); #parse what you can
    #fetch anything that completed the parse and matches the JSON Pointer
    while (my $obj = $p->fetch) {
        print "$obj->{Value}{n}: $obj->{Value}{s}\n";
    }
}

__DATA__
[
    { "n": 0, "s": "zero" },
    { "n": 1, "s": "one"  },
    { "n": 2, "s": "two"  }
]

JSON::Streaming::Reader was okay, but it is slower and suffers from too verbose an interface (all of these coderefs are required even though many do nothing):

#!/usr/bin/perl

use strict;
use warnings;

use JSON::Streaming::Reader;

my $p = JSON::Streaming::Reader->for_stream(\*DATA);

my $obj;
my $attr;
$p->process_tokens(
    start_array    => sub {}, #who cares?
    end_array      => sub {}, #who cares?
    end_property   => sub {}, #who cares?
    start_object   => sub { $obj = {}; },     #clear the current object
    start_property => sub { $attr = shift; }, #get the name of the attribute
    #add the value of the attribute to the object
    add_string     => sub { $obj->{$attr} = shift; },
    add_number     => sub { $obj->{$attr} = shift; },
    #object has finished parsing, it can be used now
    end_object     => sub { print "$obj->{n}: $obj->{s}\n"; },
);

__DATA__
[
    { "n": 0, "s": "zero" },
    { "n": 1, "s": "one"  },
    { "n": 2, "s": "two"  }
]

To parse 1,000 records it took JSON::SL .2 seconds and JSON::Streaming::Reader 3.6 seconds (note, JSON::SL was being fed 4k at a time, I had no control over JSON::Streaming::Reader's buffer size).

like image 56
Chas. Owens Avatar answered Sep 23 '22 17:09

Chas. Owens


Have you looked at JSON::Streaming::Reader which shows up as first while searching for 'JSON Stream' on search.cpan.org?

Alternatively JSON::SL found by searching for 'JSON SAX' - not quite as obvious search terms, but what you describe sounds like a SAX parsers for XML.

like image 32
pmakholm Avatar answered Sep 22 '22 17:09

pmakholm