Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for storing GPS data of a tracking app in mysql database

I have a datamodel question for a GPS tracking app. When someone uses our app it will save latitude, longitude, current speed, timestamp and burned_calories every 5 seconds. When a workout is completed, the average speed, total time/distance and burned calories of the workout will be stored in a database. So far so good..

What we want is to also store the data that is saved those every 5 seconds, so we can utilize this later on to plot graphs/charts of a workout for example.

How should we store this amount of data in a database? A single workout can contain 720 rows if someone runs for an hour. Perhaps a serialised/gzcompressed data array in a single row. I'm aware though that this is bad practice..

A relational one/many to many model would be undone? I know MySQL can easily handle large amounts of data, but we are talking about 720 * workouts twice a week * 7000 users = over 10 million rows a week. (Ofcourse we could only store the data of every 10 seconds to halve the no. of rows, or every 20 seconds, etc... but it would still be a large amount of data over time + the accuracy of the graphs would decrease)

How would you do this? Thanks in advance for your input!

like image 858
24creative Avatar asked Nov 02 '12 14:11

24creative


1 Answers

Just some ideas:

  1. Quantize your lat/lon data. I believe that for technical reasons, the data most likely will be quantized already, so if you can detect that quantization, you might use it. The idea here is to turn double numbers into reasonable integers. In the worst case, you may quantize to the precision double numbers provide, which means using 64 bit integers, but I very much doubt your data is even close to that resolution. Perhaps a simple grid with about one meter edge length is enough for you?
  2. Compute differences. Most numbers will be fairly large in terms of absolute values, but also very close together (unless your members run around half the world…). So this will result in rather small numbers. Furthermore, as long as people run with constant speed into a constant direction, you will quite often see the same differences. The coarser your spatial grid in step 1, the more likely you get exactly the same differences here.
  3. Compute a Huffman code for these differences. You might try encoding lat and long movement separately, or computing a single code with 2d displacement vectors at its leaves. Try both and compare the results.
  4. Store the result in a BLOB, together with the dictionary to decode your Huffman code, and the initial position so you can return data to absolute coordinates.

The result should be a fairly small set of data for each data set, which you can retrieve and decompress as a whole. Retrieving individual parts from the database is not possible, but it sounds like you wouldn't be needing that.

The benefit of Huffman coding over gzip is that you won't have to artificially introduce an intermediate byte stream. Directly encoding the actual differences you encounter, with their individual properties, should work much better.

like image 89
MvG Avatar answered Nov 17 '22 18:11

MvG