Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle large data sets for server-side simulation --> client browser

Sorry for the somewhat confusing title. Not sure really how to title this. My situation is this- I have an academic simulation tool, that I in the process of developing a web front-end for. While the C++-based simulator is computationally quite efficient (several hundredths to a tenth of a second runtime) for small systems, it can generate a significant (in web app terms) amount of data (~4-6mb).

Currently the setup is as follows-

  1. User accesses index.html file. This page on the left side has an interactive form where the user can input simulation parameters. On the right side is a representation of the system they are creating, along with some greyed out tabs for various plots of the simulation data.
  2. User clicks "Run simulation." This submits the requested sim parameters to a runSimulation.php file via an AJAX call. runSimulation.php creates an input file based on the submitted data, then runs the simulator using this input file. The simulator spits out 4-6mb of data in various output files.
  3. Once the simulation is done running, the response to the browser is another javascript function which calls a file returnData.php. This php script packages the data in the output files as JSON data, returns the JSON data to the browser, then deletes the data files.
  4. This response data is then fed to a few plotting objects in the browser's javascript, and the plot tabs become active. The user can then open and interact with the plotted data.

This setup is working OK, however I am running into two issues:

  • The return data is slow- 4-6mb of data coming back can take a while to load. (That data is being gzipped, which reduces its side considerably, but it still can take 20+ seconds on a slower connection)
  • The next goal is to allow the user to plot multiple simulation runs so that they can compare the results.

My thought is that I might want to keep the data files on the server, while the users session is active. This would enable the ability to only load up the data for the plot the user wants to view (and perhaps loading other data in the background as they view the results of the current plot). For the multiple runs, I can have multiple data sets sitting on the server, ready for the user to download if/when they are needed.

However, I have a big issue with this line of thinking- how do I recognize (in php) that the user has left the server, and delete the data? I don't want the users to take over the drive space on the machine. Any thoughts on best practices for this kind of web app?

like image 550
MarkD Avatar asked Nov 14 '22 09:11

MarkD


1 Answers

For problem #1, you don't really have any options. You are already Gzip'ing the data, and using JSON, which is a relatively lightweight format. 4~6 MB of data is indeed a lot. BTW if you think PHP is taking too long to generate the data, you can use your C++ program to generate the data and serve it using PHP. You can use exec() to do that.

However, I am not sure how your simulations work, but Javascript is a Turing-complete language, so you could possibly generate some/most/all of this data on the client side (whatever makes more sense). In this case, you would save lots of bandwidth and decrease loading times significantly - but mind that JS can be really slow.

For problem #2, if you leave data on the server you'll need to keep track of active sessions (ie: when was the last time the user interacted with the server), and set a timeout that makes sense for your application. After the timeout, you can delete the data.

To keep track of interaction, you can use JS to check if a user is active (by sending heartbeats or something like that).

like image 143
NullUserException Avatar answered Dec 09 '22 16:12

NullUserException