Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to map mulit-level object to indexedDB for best efficiency

Tags:

indexeddb

My question concerns laying out a data structure within indexedDB. I started out building a small web page feature that grew into something more of a web learning tool to now more closely to a stand-alone progressive web application. Using localStorage has worked well but since the tool has grown, the 5MB limit may become a problem for some users; so, there is a need to switch to indexedDB.

The application is for desktops only and allows the user to build a portfolio of modules and save the data to the hard drive as a JSON string. When the user opens (uploads) the file in the application, the string is parsed and the entire portfolio written to localStorage again but only one module is written to a run-time object at any one time. There isn't a need for a "genuine" database from the perspective of searching for data by different fields and indexing, but only a need for a greater amount of storage because it would be too confusing for the user if each module in a portfolio had to be a separate file.

Most of the data saved to localStorage is from a three-level object, and a key is made based on the object path to save and retrieve the data. For example, object.level_1[key_1].level_2[key_2].level_3[key_3].height = 10 is saved as localStorage.setItem( 'k1.k2.k3.h', 10).

My question is, when moving to indexedDB, which is more efficent: a single objectStore much like the localStorage set up, or a separate objectStore for each of the three levels of the portfolio?

If a single objectStore can be viewed as being similar to a two-column table with one row (a key and a value) for each individual data point, the row count would be greater than the sum of the row counts for the three objectStores, where each row is a key and an object of multiple data points; but, to update an individual data point in one of the three objectStores, the database object has to be written to a temporary object, the data point updated, and then written back to the objectStore.

The question is, then, which is more efficient: searching through a single table of many rows for a single unique key pointing to one less-complex value, or searching through one of three tables with fewer rows but having to perform what I think is equivalent to a JSON parse, value update, and JSON stringify to update the same value in the database?

Although no limit is explicitly set, an expected maximum number of level_1 objects in a single portfolio is about 25, where each could likely contain up to 100 level_2 objects, which in turn could each contain a maximum of around 5 level_3 objects. Anything larger than this would most likely lead the user to simply build separate portfolios.

So, the level_1 objectStore would be about 25 rows, the level_2 objectStore about 2500 rows, and the level_3 objectStore about 12,500 rows. Each level_1 object has about 40 data points; each level_2 object has about 100 data points; and each level_3 object has about 20 data points. So, I think a single objectStore would have the equivalent of (25)(40) + (2500)(100) + (12,500)(20) = 501,000 rows.

I'm semi-experienced at extracting data using SQL from very large databases but know absolutely nothing about how a database is setup to locate data by key. If it had to search from top to bottom checking each of the 501,000 rows until a matching key is found, then one objectStore appears rather a ridiculous choice to three objectStores. But, if indexedDB employs a more efficent method, then one objectStore could possibly be more efficient depending on how efficient it is to update a property value in an object of one of the three objectStores.

I am not a programmer by trade; so, I apologize if some of my terminolgy is inexact and I realize that my question is of a rather basic level; but I have been unable to locate any information addressing how to "map" an object to an object database in an efficent manner.

Thank you for reading my question and for any direction you may be able to provide.

EDIT / UPDATE:

Thank you, Josh, for taking the time to respond to my question and for providing a number of items to think about. I had not yet considered how at what points during the application different types of data are written to browser storage influences the determination of the number of object stores.

There are two large data movements that generally occur only once each during a user's session: the upload from hard disk of a JSON string to be parsed and written to browser storage, and then the reading of browser storage into an object to be stringified and downloaded to hard disk. Users, most likely, expect these two steps to involve at least enough time to require some form of a brief progresss indicator. The important time items are the time it takes to store data edits and create new data elements.

Following Josh's comments, perhaps, a good way to set up object stores is to consider when and what data gets written to browser storage by screens, for lack of a better term. In my application, only one module (level_1 object in the portfolio) is ever loaded into a run time object at any one time. There is one screen for module-level data. When that screen is exited, any changes in the module-level data are written to storage.

Each level_2 object in a module has its own screen, and as the user navigates between level_2 object screens, the content in the screen input elements are checked against the run-time object's values for changes, and any changes are written to storage.

While on a level_2 object screen, a user adds level_3 objects to specific level_2 elements through calling a window that appears on top of the level_2 screen. When each window is closed, a similar check is performed and any data changes are written to storage.

Creating object stores that align with the data displayed and collected on each screen appears to make sense and, of course, aligns with the object levels. However, it still doesn't answer which data structure would ultimately be the most efficient, providing the best user experience time-wise.

Apart from some type of rule of thumb for database efficiency, the likely best approach for my particular question and circumstance is to code it both ways, fill the portfolio with a larger than expected number of maximum modules, and level_2 and level_3 objects, and test the performance of writing and reading data to indexedDB. The first method of a single object store ought to be fairly easy to code since it is set up almost exactly like localStorage. The second approach of using at least three object stores will take more time, but it will likely be a necessary and worthwhile learning experience for someone with my limited background in these areas.

If I am successful, I will share the results here in the near future. Thank you.

EDIT:

Thanks for the further explanation. I'm not going to be querying the database in that type of manner but am storing data for retrieval based on the unique key only. However, your earlier comments about storing the same data in multiple tables finally registered in my mind and I think greatly simplified my entire question and approach. I was thinking too much from a local storage perspective.

What I think will work well is multiple object stores: one object store that contains one complete object for each module or level_1 data in the portfolio, and a three or four object stores that contain subsets of data for the "active" or loaded module only.

When the user selects a module to load, it will be loaded in its entirety from the module object store in one step, and subsets (different object levels) of that module will be written to a number of different object stores. When the user makes edits to the module data at any level, the edits will be stored in the appropriate subset object store since that will be much quicker.

If the user properly exits/closes a module, then at that time the loaded object will be written in its entirety to the module object store, and the subset object stores will be emptied. The subset object stores are there to preserve the changes in the event that the user fails to exit properly or there is a power or OS failure.

When the application is opened, browser storage will be tested to determine if there is a database and, if so, whether or not the subset object stores are empty. If empty, then a proper close and save of the module was performed. If not empty, then edits to the module did not make it into the module object store for whatever reason, and the user will me prompted to either recover or discard the edits saved in the subset object stores. If the user chooses to recover, then the data in the subset object stores must be gathered together into a complete module and written to the module object store.

This ought to work fine for the anticipated maximum size of any single module in this application; but if the size of a module were to become too much for the browser when loaded in its entirety, then the subset object stores could be used to populate the screens; and when the user exits the module, the subsets could be gathered together to build a complete set of module data and written to the module object store, just as for a recover.

Of course, there is no way to test during run time if the browser is running too slowly due to an overly large module and change the approach at that time. I just mean that if during my testing of large sample modules, it is observed that the browser runs too slowly, then the second approach will need to be implemented.

I realize that my particular question is not as interesting as the items listed in the response. However, reading about those general concepts helped me to better understand how to address my less interesting use of indexedDB and to avoid a considerable amount of messing about coding unnecessary complexity to a simple problem. Thanks again.

like image 559
Gary Avatar asked May 27 '18 03:05

Gary


1 Answers

I think you are on to your own answer, so my response here is only intended to push you along.

The main difference between nosql and a traditional sql database is the lack of query-planning. Query planning is the functionality provided by an sql database, where it accepts your query, parses it, and then converts it into an algorithm that finds matching records and returns them to you in a result set. Query planning involves choosing the most optimal approach, generally by trying to minimize the number of steps involved, the amount of memory involved, or the amount of time that will elapse. On the other hand, you are on your own with nosql. You have to become an overnight query-planning expert.

That's both a boon and a burden. Query planning is a complexity cliff for some, and you can quickly find yourself reading some confusing stuff. But if you are looking for a more technical answer then it would be in this direction, of learning more about how databases do query planning.

To speed that up, I would apply the same conventional knowledge about normalization and denormalization. Boyce-Codd and normal forms 1-5 and all that. nosql is on the extreme denormalization end. The 'logical' structure of the items you store is irrelevant. With nosql your objective is not a nice traditional and intuitive schema. Your objective is to efficiently perform your storage operations, your queries.

So to answer the question you have to start with a simple analysis of your operations. Enumerate the operations your app performs. Which are the most frequent operations? Which do you assume will take the longest to complete? By operations, I am not talking about low level queries here, nor the schema of your db in nosql/sql. That is a level too deep of abstraction. Think more abstractly. Enumerate things like "load the info for all the people that meet these conditions", "delete those people over there". I picked up on some of the queries you mention, but I didn't pick up on a clear list, and this list is important criteria in a proper answer.

Once you have enumerated those operations, then I think you are closer to answering your question. As a toy example, think about updates. Are updates frequent? Frequent updates would suggest one object store is bad, because you have to load a ton of irrelevant things just to change one property of an object. Think about granularity. Do you need all of an object's properties, or only some? Think about what is the most frequent operation? Is it loading a list of objects according to some criteria? Is it deleting or updating things? Think about what things are loaded at the same time (co-location). When you load one instance of a level 2 object, are the other instances typically also loaded? If not, then why store them together? Step away from your normalized schema and just forget about it. You want a denormalized schema where you are storing data in a manner so as to optimize your queries. The end result may be nothing like what you imagine.

Maybe a good thought experiment would be this. Pseudocode the function that would do the actual heavy lifting. You will run straight into the problems and identify the parts of the function that will probably be really slow. The answer to your question then is essentially what data structure would really speed those parts up, or at least slow them down less than other data structures.

Edit: one little followup. A rather counterintuitive feature of nosql databases and denormalization is that you may end up storing data multiple times. Sometimes it makes sense to store the same data in multiple places. Because it speeds up queries. And yes it introduces room for inconsistencies, and violates the no-functional-dependencies rule of sql. But you can enforce data integrity (consistency) through the use of multi-store transactions and a bit of care. To elaborate further, the stores you want might just be the literal results of the queries you plan to perform. Yes. Create an object store for each query you plan to perform. Store data redundantly among all of them. Yes that sounds nutty and extreme. And it is a tad exaggerated. But this approach is common, and promoted, when using nosql.

Edit: and here is a rough first attempt, just brainstorming a bit, this is an attempt to give you a more concrete answer based on guessing what you are trying to actually do

What you want is an object store called 'settings'. Each object in the store represents a Settings object. A single settings object has the properties like settings id, settings property name, settings property value, level 1 property, level 2 property, level 3 property.

Your basic read queries might look like SELECT * from Settings WHERE level1 = 'a' && level2 = 'b'.

Taking this further, you could then optimize for certain views, using indices. We could create an index on the level1 property, and index on the level2 property, and an index on the level1+level2 properties combined.

Let's say your most frequent operation, that needs to be fastest, is to load all settings belonging to a particular combination of levels 1, 2, and 3. Create an index on all 3, and then it is just a matter of iterating over that index.

The schema in this brainstorming example is a single object store, along with some indices to speed certain queries up. Given that indices are basically derived object stores, you could make the conceptual argument you are practically using multiple stores although you are actually only using one. Anyway that might be getting pedantic. The point of this example is just to demonstrate that the schema of your object store has nothing at all to do with how you conceptualize the hierarchy of portfolios and levels. It only has to do with making the queries you need to perform fast.

like image 132
Josh Avatar answered Nov 07 '22 23:11

Josh