JRuby 9.2, Rails 4.2
I've been work with reading/dumping large JSON objects, 100+MB files, which would routinely hang for a few minutes then run out of memory when writing them out.
I noticed it specifically happens when using to_json
. Below is with a 4MB file so it actually completes:
pry(main)> f = File.open('nvdcve-1.0-recent.json'){|f| JSON.load(f) }
pry(main)> puts Benchmark.measure{ JSON.dump(f) }
0.230000 0.010000 0.240000 ( 0.081894)
=> nil
pry(main)> puts Benchmark.measure{ f.to_json }
1.020000 0.020000 1.040000 ( 0.820851)
=> nil
The source of the problem being that ActiveSupport is overriding to_json
on a bunch of things:
pry(main)> f.method(:to_json)
=> #<Method: Hash#to_json(to_json_with_active_support_encoder)>
pry(main)> puts Benchmark.measure{ f.to_json_without_active_support_encoder }
0.040000 0.000000 0.040000 ( 0.035408)
pry(main)> puts Benchmark.measure{ f.to_json_with_active_support_encoder }
1.170000 0.010000 1.180000 ( 0.812674)
See here: https://apidock.com/rails/Object/to_json_with_active_support_encoder
So what exactly is ActiveSupport::JSON.encode
doing differently from the JSON gem's to_json
that is causing it to be so much slower, even on an unremarkable JSON file that was loaded from a JSON file to begin with?
Would I be breaking any Rails functionality if I overrode to_json
to not use the ActiveSupport variant?
According to the source ActiveSupport::JSON is pretty much just an additional layer on top of the JSON gem - which actually does most of the heavy lifting. What it mainly does is:
ESCAPED_CHARS = {
"\u2028" => '\u2028',
"\u2029" => '\u2029',
">" => '\u003e',
"<" => '\u003c',
"&" => '\u0026',
}
The comments don't state which the problematic browsers in question are and this might be a total non-issue for modern browsers.
# Convert an object into a "JSON-ready" representation composed of
# primitives like Hash, Array, String, Numeric,
# and +true+/+false+/+nil+.
# Recursively calls #as_json to the object to recursively build a
# fully JSON-ready object.
#
# This allows developers to implement #as_json without having to
# worry about what base types of objects they are allowed to return
# or having to remember to call #as_json recursively.
#
# Note: the +options+ hash passed to +object.to_json+ is only passed
# to +object.as_json+, not any of this method's recursive +#as_json+
# calls.
def jsonify(value)
case value
when String
EscapedString.new(value)
when Numeric, NilClass, TrueClass, FalseClass
value.as_json
when Hash
Hash[value.map { |k, v| [jsonify(k), jsonify(v)] }]
when Array
value.map { |v| jsonify(v) }
else
jsonify value.as_json
end
end
This is probably the most important part as the #as_json
is pretty much key to how models present themselves as JSON. It also handles casting dates and timestamps and other non-primitives (ActiveSupport provides #as_json
methods for these). Removing this would most likely break a lot of things.
Depending your use case you could perhaps manually create ruby objects (arrays, hashes, etc) from your model objects and use JSON.generate(data)
- I'm not sure it will perform better though.
tl;dr: ActiveSupport::JSON#encode
is doing three extra things.
as_json
asking each object to provide a data structure suitable for serialization.JSON.dump
decides how things will become JSON, you have no control. The as_json
callback allows you to control exactly how your objects are dumped. For simple objects this adds a bit of overhead. For everything else it's very important.
jsonify
makes dumping objects easier. It chews up about 29% of the time.
This allows developers to implement #as_json without having to worry about what base types of objects they are allowed to return or having to remember to call #as_json recursively.
Escaping HTML allows one to just plop JSON into an HTML page. That's about 38% of the time. That's of limited use for chewing up so much time. Unfortunately it can't be easily turned off. That might be a worthwhile patch.
Just 10% is spent inside the real JSON encoder. This roughly concurs with my Benchmark results.
[12] pry(main)> puts Benchmark.measure{ ActiveSupport::JSON.encode(h) }
1.241299 0.009459 1.250758 ( 1.253938)
=> nil
[13] pry(main)> puts Benchmark.measure{ JSON.dump(h) }
0.117118 0.003333 0.120451 ( 0.120881)
Note: This is with Ruby 2.6.3 and Rails 5.2.3. That's what I've got, and performance issues with old versions are less interesting. You can run them yourself, I expect it to be about the same.
Running this with ruby-prof
we can see that JSON.dump
is spending all its time in just a few functions.
$ rails -e 'require "json"; require "ruby-prof"; h = File.open("nvdcve-1.0-recent.json"){|f| JSON.load(f) }; result = RubyProf.profile { JSON.dump(h) }; printer = RubyProf::GraphPrinter.new(result); printer.print(STDOUT, {})' > json.prof
------------------------------------------------------------------------------------------------------------------------------------------------------
0.275 0.134 0.000 0.141 1/1 <Module::JSON>#generate
99.98% 48.78% 0.275 0.134 0.000 0.141 1 JSON::Ext::Generator::State#generate
0.097 0.097 0.000 0.000 401143/401143 String#encode
0.021 0.021 0.000 0.000 101047/101047 Hash#keys
0.021 0.021 0.000 0.000 213358/213358 String#to_s
0.001 0.001 0.000 0.000 2058/2058 Float#to_s
------------------------------------------------------------------------------------------------------------------------------------------------------
0.097 0.097 0.000 0.000 401143/401143 JSON::Ext::Generator::State#generate
35.26% 35.26% 0.097 0.097 0.000 0.000 401143 String#encode
------------------------------------------------------------------------------------------------------------------------------------------------------
0.021 0.021 0.000 0.000 101047/101047 JSON::Ext::Generator::State#generate
7.79% 7.79% 0.021 0.021 0.000 0.000 101047 Hash#keys
------------------------------------------------------------------------------------------------------------------------------------------------------
0.021 0.021 0.000 0.000 213358/213358 JSON::Ext::Generator::State#generate
7.77% 7.77% 0.021 0.021 0.000 0.000 213358 String#to_s
------------------------------------------------------------------------------------------------------------------------------------------------------
String#encode
, presumably to ensure all strings are UTF-8
as required.Now with ActiveSupport::JSON.encode(h)
. For some reason h.to_json
doesn't go through encode
from the command line, but it does in console.
$ rails -e 'require "json"; require "ruby-prof"; h = File.open("nvdcve-1.0-recent.json"){|f| JSON.load(f) }; result = RubyProf.profile { ActiveSupport::JSON.encode(h) }; printer = RubyProf::GraphPrinter.new(result); printer.print(STDOUT, {})' > as.prof
It's profile is more complicated. Starting from the top.
------------------------------------------------------------------------------------------------------------------------------------------------------
3.020 0.000 0.000 3.020 1/1 <Module::ActiveSupport::JSON>#encode
96.58% 0.00% 3.020 0.000 0.000 3.020 1 ActiveSupport::JSON::Encoding::JSONGemEncoder#encode /Users/schwern/.rvm/gems/ruby-2.6.3/gems/activesupport-5.2.3/lib/active_support/json/encoding.rb:34
1.483 0.000 0.000 1.483 1/1 ActiveSupport::JSON::Encoding::JSONGemEncoder#stringify
0.894 0.000 0.000 0.894 1/519065 ActiveSupport::JSON::Encoding::JSONGemEncoder#jsonify
0.643 0.000 0.000 0.643 1/101047 Hash#as_json
0.000 0.000 0.000 0.000 1/305710 Kernel#dup
------------------------------------------------------------------------------------------------------------------------------------------------------
Its spending 95% of its time in ActiveSupport::JSON::Encoding::JSONGemEncoder#encode
.
def encode(value)
stringify jsonify value.as_json(options.dup)
end
value.as_json
asks the value to return a data structure it wants serialized as JSON. This is scattered amongst different methods.
These don't take up a lot of time themselves, but they do recurse eating up up 21% of the total time.
------------------------------------------------------------------------------------------------------------------------------------------------------
0.634 0.009 0.000 0.625 9800/101047 Hash#each
0.642 0.075 0.000 0.567 91246/101047 Array#map
0.643 0.000 0.000 0.643 1/101047 ActiveSupport::JSON::Encoding::JSONGemEncoder#encode
20.56% 2.69% 0.643 0.084 0.000 0.559 101047 *Hash#as_json /Users/schwern/.rvm/gems/ruby-2.6.3/gems/activesupport-5.2.3/lib/active_support/core_ext/object/json.rb:157
0.643 0.031 0.000 0.612 101047/202094 Enumerable#map
0.036 0.036 0.000 0.000 101047/202094 <Class::Hash>#[]
------------------------------------------------------------------------------------------------------------------------------------------------------
0.643 0.003 0.000 0.639 10311/10311 Hash#each
20.56% 0.11% 0.643 0.003 0.000 0.639 10311 *Array#as_json /Users/schwern/.rvm/gems/ruby-2.6.3/gems/activesupport-5.2.3/lib/active_support/core_ext/object/json.rb:151
0.643 0.051 0.000 0.592 10311/20623 Array#map
------------------------------------------------------------------------------------------------------------------------------------------------------
0.000 0.000 0.000 0.000 1102/187785 Array#map
0.033 0.033 0.000 0.000 186683/187785 Hash#each
1.05% 1.05% 0.033 0.033 0.000 0.000 187785 String#as_json /Users/schwern/.rvm/gems/ruby-2.6.3/gems/activesupport-5.2.3/lib/active_support/core_ext/object/json.rb:86
------------------------------------------------------------------------------------------------------------------------------------------------------
0.001 0.001 0.000 0.000 2058/4116 Hash#each
0.001 0.001 0.000 0.000 2058/4116 ActiveSupport::JSON::Encoding::JSONGemEncoder#jsonify
0.05% 0.04% 0.002 0.001 0.000 0.000 4116 Float#as_json /Users/schwern/.rvm/gems/ruby-2.6.3/gems/activesupport-5.2.3/lib/active_support/core_ext/object/json.rb:106
0.000 0.000 0.000 0.000 4116/4116 Float#finite?
------------------------------------------------------------------------------------------------------------------------------------------------------
jsonify
converts the as_json
structure into JSON primitives. "This allows developers to implement #as_json without having to worry about what base types of objects they are allowed to return or having to remember to call #as_json recursively."
This eats up about 29% of the total time.
------------------------------------------------------------------------------------------------------------------------------------------------------
0.894 0.107 0.000 0.787 92348/519065 Array#map
0.894 0.224 0.000 0.670 426716/519065 Hash#each
0.894 0.000 0.000 0.894 1/519065 ActiveSupport::JSON::Encoding::JSONGemEncoder#encode
28.60% 10.59% 0.894 0.331 0.000 0.563 519065 *ActiveSupport::JSON::Encoding::JSONGemEncoder#jsonify /Users/schwern/.rvm/gems/ruby-2.6.3/gems/activesupport-5.2.3/lib/active_support/json/encoding.rb:85
0.894 0.036 0.000 0.858 101047/202094 Enumerable#map
0.894 0.029 0.000 0.866 10311/20623 Array#map
0.193 0.129 0.000 0.064 401143/401153 Class#new
0.130 0.130 0.000 0.000 1104275/1104283 Module#===
0.031 0.031 0.000 0.000 101047/202094 <Class::Hash>#[]
0.001 0.001 0.000 0.000 2058/4116 Float#as_json
0.001 0.001 0.000 0.000 2533/5066 FalseClass#as_json
0.000 0.000 0.000 0.000 1973/3946 TrueClass#as_json
------------------------------------------------------------------------------------------------------------------------------------------------------
stringify
hands all that to JSON.generate
.------------------------------------------------------------------------------------------------------------------------------------------------------
1.483 0.000 0.000 1.483 1/1 ActiveSupport::JSON::Encoding::JSONGemEncoder#encode
47.42% 0.00% 1.483 0.000 0.000 1.483 1 ActiveSupport::JSON::Encoding::JSONGemEncoder#stringify /Users/schwern/.rvm/gems/ruby-2.6.3/gems/activesupport-5.2.3/lib/active_support/json/encoding.rb:101
1.483 0.000 0.000 1.483 1/1 <Module::JSON>#generate
------------------------------------------------------------------------------------------------------------------------------------------------------
JSON.generate
calls JSON::Ext::Generator::State#generate
which eats 47% of the total time.
But 38% is spent in ActiveSupport::JSON::Encoding::JSONGemEncoder::EscapedString#to_json
, this method escapes HTML.
That means the time spent actually generating JSON is just 10% of the run time. This jives with the Benchmark results.
------------------------------------------------------------------------------------------------------------------------------------------------------
1.483 0.228 0.000 1.254 1/1 <Module::JSON>#generate
47.42% 7.31% 1.483 0.228 0.000 1.254 1 JSON::Ext::Generator::State#generate
1.176 0.271 0.000 0.905 401143/401143 ActiveSupport::JSON::Encoding::JSONGemEncoder::EscapedString#to_json
0.040 0.040 0.000 0.000 213358/213358 ActiveSupport::JSON::Encoding::JSONGemEncoder::EscapedString#to_s
0.037 0.037 0.000 0.000 101047/101047 Hash#keys
0.001 0.001 0.000 0.000 2058/2058 Float#to_s
------------------------------------------------------------------------------------------------------------------------------------------------------
1.176 0.271 0.000 0.905 401143/401143 JSON::Ext::Generator::State#generate
37.63% 8.68% 1.176 0.271 0.000 0.905 401143 ActiveSupport::JSON::Encoding::JSONGemEncoder::EscapedString#to_json /Users/schwern/.rvm/gems/ruby-2.6.3/gems/activesupport-5.2.3/lib/active_support/json/encoding.rb:55
0.614 0.209 0.000 0.405 401143/401143 ActiveSupport::ToJsonWithActiveSupportEncoder#to_json
0.292 0.292 0.000 0.000 401143/401143 String#gsub
------------------------------------------------------------------------------------------------------------------------------------------------------```
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With