Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

optimizing JSON querying performance in javascript

I have a 10MB JSON file of the following structure (10k entries):

{
entry_1: {
    description: "...",
    offset: "...",
    value: "...",
    fields: {
        field_1: {
            offset: "...",
            description: "...",
        },
        field_2: {
            offset: "...",
            description: "...",
        }   
    }
},
entry_2:
...
...
...

}

I want to implement an autocomplete input field that will fetch suggestions from this file, as fast as possible while searching multiple attributes. For example, finding all entry names,field names and descriptions that contain some substring.

Method 1:

I tried to flatten the nesting into an array of strings:

"entry_1|descrption|offset|value|field1|offset|description",
"entry_1|descrption|offset|value|field2|offset|description",
"entry2|..."

and perform case insensitive partial string match, query took about 900ms.

Method 2

I tried Xpath-based JSON querying (using defiant.js).

  var snapshot = Defiant.getSnapshot(DATA);
  found = JSON.search(snapshot, '//*[contains(fields, "substring")]');

query took about 600ms (just for a single attribute, fields).

Are there other options that will get me to sub 100ms? I have control of the file format so I can turn it into XML or any other format, the only requirement is speed.

like image 421
susdu Avatar asked Nov 26 '17 08:11

susdu


People also ask

How big should JSON response be?

There is a fixed number (1GB), but the ideal size is going to be less than this value. If you need a number, my recommendation is usually to try to keep JSON documents to a few MB in size. This makes the most sense.

Is JSON parse slow?

parse is a slow way to create a copy of an object. But can it actually improve the performance of our code? This post requires a basic knowledge about Shapes and Inline Cache.

Does JSON compress well?

As text data, JSON data compresses nicely. That's why gzip is our first option to reduce the JSON data size. Moreover, it can be automatically applied in HTTP, the common protocol for sending and receiving JSON.

Is Minified JSON faster?

Minification does improve performance for two reasons: Reduced file-size (because it removes comments and unnecessary white spaces), so your script loads faster.


1 Answers

Since you are trying to search for a substring of values it is not a good idea to use indexeddb as suggested. You can try flattening the values of the fields to text where fields seperated by :: and each key in the object is a line in the text file:

{
  key1:{
    one:"one",
    two:"two",
    three:"three"
  },
  key2:{
    one:"one 2",
    two:"two 2",
    three:"three 2"
  }
}

Will be:

key1::one::two::three
key2::one 2::two 2::three

Then use regexp to search for text after the keyN:: part and store all keys that match. Then map all those keys to the objects. So if key1 is the only match you'd return [data.key1]

Here is an example with sample data of 10000 keys (search on laptop takes couple of milliseconds but have not tested when throttling to mobile):

//array of words, used as value for data.rowN
const wordArray = ["actions","also","amd","analytics","and","angularjs","another","any","api","apis","application","applications","are","arrays","assertion","asynchronous","authentication","available","babel","beautiful","been","between","both","browser","build","building","but","calls","can","chakra","clean","client","clone","closure","code","coherent","collection","common","compiler","compiles","concept","cordova","could","created","creating","creation","currying","data","dates","definition","design","determined","developed","developers","development","difference","direct","dispatches","distinct","documentations","dynamic","easy","ecmascript","ecosystem","efficient","encapsulates","engine","engineered","engines","errors","eslint","eventually","extend","extension","falcor","fast","feature","featured","fetching","for","format","framework","fully","function","functional","functionality","functions","furthermore","game","glossary","graphics","grunt","hapi","has","having","help","helps","hoisting","host","how","html","http","hybrid","imperative","include","incomplete","individual","interact","interactive","interchange","interface","interpreter","into","its","javascript","jquery","jscs","json","kept","known","language","languages","library","lightweight","like","linked","loads","logic","majority","management","middleware","mobile","modular","module","moment","most","multi","multiple","mvc","native","neutral","new","newer","nightmare","node","not","number","object","objects","only","optimizer","oriented","outside","own","page","paradigm","part","patterns","personalization","plugins","popular","powerful","practical","private","problem","produce","programming","promise","pure","refresh","replace","representing","requests","resolved","resources","retaining","rhino","rich","run","rxjs","services","side","simple","software","specification","specifying","standardized","styles","such","support","supporting","syntax","text","that","the","their","they","toolkit","top","tracking","transformation","type","underlying","universal","until","use","used","user","using","value","vuejs","was","way","web","when","which","while","wide","will","with","within","without","writing","xml","yandex"];
//get random number
const rand = (min,max) =>
  Math.floor(
    (Math.random()*(max-min))+min
  )
;
//return object: {one:"one random word from wordArray",two:"one rand...",three,"one r..."}
const threeMembers = () =>
  ["one","two","three"].reduce(
    (acc,item)=>{
      acc[item] = wordArray[rand(0,wordArray.length)];
      return acc;
    }
    ,{}
  )
;
var i = -1;
data = {};
//create data: {row0:threeMembers(),row1:threeMembers()...row9999:threeMembers()}
while(++i<10000){
  data[`row${i}`] = threeMembers();
}
//convert the data object to string "row0::word::word::word\nrow1::...\nrow9999..."
const dataText = Object.keys(data)
  .map(x=>`${x}::${data[x].one}::${data[x].two}::${data[x].three}`)
  .join("\n")
;
//search for someting (example searching for "script" will match javascript and ecmascript)
//  i in the regexp "igm" means case insensitive
//return array of data[matched key]
window.searchFor = search => {
  const r = new RegExp(`(^[^:]*).*${search}`,"igm")
  ,ret=[];
  var result = r.exec(dataText);
  while(result !== null){
    ret.push(result[1]);
    result = r.exec(dataText);
  }
  return ret.map(x=>data[x]);
};
//example search for "script"
console.log(searchFor("script"));
like image 113
HMR Avatar answered Sep 28 '22 06:09

HMR