Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make it the most optimal for query search, Mongodb?

I have a db in the format of:

{
    "location_id": "11670",
    "readings": [
        {
            "S1": "10.77",
            "S2": "7.20",
            "humidity": "99.90",
            "temperature": "12.80",
            "timestamp": "1565597160"
        },
        {
            "S1": "3.70",
            "S2": "6.17",
            "humidity": "99.90",
            "temperature": "12.90",
            "timestamp": "1565597520"
        },
      ....
      600 000 lines
    ]
}

I have a collection with 15 sensors each one has 600 000lines(99% in the readings list) of json format, how can i do it optimal for seaching using mongodb? Should i parse the data into buckets for every hour? Can i have some sample code in python for this sample json? Can you tell me what is the most optimal way of structuring my data in the readings list for data queries? Should i make 15 collections for each sensor or just buckets?Also total amount of a document is 16 mb does this mean that my readings less than that?

like image 930
heyhey Avatar asked Jan 20 '26 03:01

heyhey


1 Answers

I would structure the data into one collection readings in MongoDB with a document schema like:

{
    "location_id": "11670",
    "S1": 3.70,
    "S2": 6.17,
    "humidity": 99.90,
    "temperature": 12.90,
    "timestamp": 1565597520
}

Avoid storing your data as String, instead use a double for your Numbers and a Date for your timestamp. This makes queries easier when restricting to ranges. Also you should create an index on the fields you want to filter your queries on.

If you plan to add more sensors in the future I think your schema is more flexible when you store the location_id as field in the documents and use just one collection. Otherwise you always have to add a new collection when adding a new location.

My approach stores each reading as one small document. This makes it simple to shard your data e.g. on the timestamp if your data grows too big for one server. If you would store all readings inside of one document (per sensor) it is just a question of time when you hit the limit of 16 MB per document and you will have problems to add new readings to your existing document. This can be avoided with the schema of one document per reading described above.

like image 74
Simulant Avatar answered Jan 22 '26 17:01

Simulant