Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing a list of 1 million key value pairs in python

I need to store a list of 1 million key-value pairs in python. The key would be a string/integer while the value would be a list of float values. For example:

{"key":36520193,"value":[[36520193,16.946938],[26384600,14.44005],[27261307,12.467529],[16456022,11.316026],[26045102,8.891106],[148432817,8.043456],[36670593,7.111857],[43959215,7.0957513],[50403486,6.95],[18248919,6.8106747],[27563337,6.629243],[18913178,6.573106],[42229958,5.3193846],[17075840,5.266625],[17466726,5.2223654],[47792759,4.9141016],[83647115,4.6122775],[56806472,4.568034],[16752451,4.39949],[69586805,4.3642135],[23207742,3.9822476],[33517555,3.95],[30016733,3.8994896],[38392637,3.8642135],[16165792,3.6820507],[14895431,3.5713203],[48865906,3.45],[20878230,3.45],[17651847,3.3642135],[24484188,3.1820507],[74869104,3.1820507],[15176334,3.1571069],[50255841,3.1571069],[103712319,3.1571069],[20706319,2.9571068],[33542647,2.95],[17636133,2.95],[66690914,2.95],[19812372,2.95],[21178962,2.95],[37705610,2.8642135],[20812260,2.8642135],[25887809,2.8642135],[18815472,2.8642135],[17405810,2.8642135],[46598192,2.8642135],[20592734,2.6642137],[44971871,2.5],[27610701,2.45],[92788698,2.45],[52164826,2.45],[17425930,2.2],[60194002,2.1642137],[122136476,2.0660255],[205325522,2.0],[117521212,1.9820508],[33953887,1.9820508],[22704346,1.9571068],[26176058,1.9071068],[39512661,1.9071068],[43141485,1.8660254],[16401281,1.7],[31495921,1.7],[14599628,1.7],[74596964,1.5],[55821372,1.5],[109073560,1.4142135],[91897348,1.4142135],[25756071,1.25],[25683960,1.25],[17303288,1.25],[42065448,1.25],[72148532,1.2],[19192100,1.2],[85941613,1.2],[77325396,1.2],[18266218,1.2],[114005403,1.2],[16346823,1.2],[43441850,1.2],[60660643,1.2],[41463847,1.2],[33804454,1.2],[20757729,1.2],[18271440,1.2],[51507708,1.2],[104856807,1.2],[24485743,1.2],[16075381,1.2],[68991517,1.2],[96193545,1.2],[63675003,1.2],[70735999,1.2],[25708416,1.2],[80593161,1.2],[42982108,1.2],[120368215,1.2],[24379982,1.2],[14235673,1.2],[20172395,1.2],[161441314,1.2],[37996201,1.2],[35638883,1.2],[46164502,1.2],[74047763,1.2],[19681494,1.2],[95938476,1.2],[20443787,1.2],[87258609,1.2],[34784832,1.2],[30346151,1.2],[40885516,1.2],[197129344,1.2],[14266331,1.2],[15112466,1.2],[26867986,1.2],[82726479,1.2],[23825810,1.2],[14662121,1.2],[32707312,1.2],[17477917,1.2],[123462351,1.2],[5745462,1.2],[16544178,1.2],[23284384,1.2],[45526985,1.2],[23109303,1.2],[26046257,1.2],[53654203,1.2],[133026438,1.2],[25139051,1.2],[65077694,1.2],[17469289,1.2],[15130494,1.2],[148525895,1.2],[15176360,1.2],[44853617,1.2],[9115332,1.2],[16878570,1.2],[132421452,1.2],[6273762,1.2],[124360757,1.2],[21643452,1.2],[9890492,1.2],[16305494,1.2],[18484474,1.2],[22643607,1.2],[60753586,1.2],[9200012,1.2],[30042254,1.2],[8374622,1.2],[15894834,1.2],[18438022,1.2],[78038442,1.2],[22097386,1.2],[21018755,1.2],[20845703,1.2],[164462136,1.2],[19649167,1.2],[24746288,1.2],[27690898,1.2],[42822760,1.2],[160935289,1.2],[178814456,1.2],[53574205,1.2],[41473578,1.2],[82176632,1.2],[82918057,1.2],[102257360,1.2],[17504315,1.2],[18363508,1.2],[50735431,1.2],[80647070,1.2],[40879040,1.2],[17790497,1.2],[191364080,1.2],[14429823,1.2],[22078893,1.2],[121338184,1.2],[113341318,1.2],[48900101,1.2],[38547066,1.2],[20484157,1.2],[16228699,1.2],[21179292,1.2],[15317594,1.2],[55777010,1.2],[15318882,1.2],[182109160,1.2],[45238537,1.2],[19701986,1.2],[32484918,1.2],[18244358,1.2],[18479513,1.2],[19081775,1.2],[21117305,1.2],[19325724,1.2],[136844568,1.2],[32398651,1.2],[20482993,1.2],[14063937,1.2],[91324381,1.2],[20528275,1.2],[14803917,1.2],[16208245,1.2],[17419051,1.2],[31187903,1.2],[54043787,1.2],[167737676,1.2],[24431712,1.2],[24707301,1.2],[24420092,1.2],[15469536,1.2],[26322385,1.2],[77330594,1.2],[82925252,1.2],[28185335,1.0],[24510384,1.0],[24407244,1.0],[41229669,1.0],[16305330,1.0],[26246555,1.0],[28183026,1.0],[49880016,1.0],[104621640,1.0],[36880083,1.0],[19705747,1.0],[22830942,1.0],[21440766,1.0],[54639609,1.0],[49077908,1.0],[29588859,1.0],[23523447,1.0],[20803216,1.0],[20221159,1.0],[1416611,1.0],[3744541,1.0],[21271656,1.0],[68956490,1.0],[96851347,1.0],[39479083,1.0],[27778893,1.0],[18785448,1.0],[39010580,1.0],[65796371,1.0],[124631720,1.0],[27039286,1.0],[18208354,1.0],[51080209,1.0],[37388787,1.0],[18462037,1.0],[31335156,1.0],[21346320,1.0],[23911410,1.0],[73134924,1.0],[807095,1.0],[44465330,1.0],[16732482,1.0],[37344334,1.0],[734753,1.0],[23006794,1.0],[33549858,1.0],[102693093,1.0],[51219631,1.0],[20695699,1.0],[4081171,1.0],[27268078,1.0],[80116664,1.0],[32959253,1.0],[85772748,1.0],[27109019,1.0],[28706024,1.0],[59701568,1.0],[23559586,1.0],[15693493,1.0],[56908710,1.0],[6541402,1.0],[15855538,1.0],[126169000,1.0],[24044209,1.0],[80700514,1.0],[21500333,1.0],[18431316,1.0],[44496963,1.0],[68475722,1.0],[15202472,1.0],[19329393,1.0],[39706174,1.0],[22464533,1.0],[81945172,1.0],[22101236,1.0],[19140282,1.0],[31206614,1.0],[15429857,1.0],[27711339,1.0],[14939981,1.0],[62591681,1.0],[52551600,1.0],[40359919,1.0],[27828234,1.0],[21414413,1.0],[156132825,1.0],[21586867,1.0],[23456995,1.0],[25434201,1.0],[30107143,1.0],[34441838,1.0],[37908934,1.0],[47010618,1.0],[139903189,1.0],[17833574,1.0],[758608,1.0],[15823236,1.0],[37006875,1.0],[10302152,1.0],[40416155,1.0],[21813730,1.0],[18785600,1.0],[30715906,1.0],[428333,1.0],[22059385,1.0],[15155074,1.0],[11061902,1.0],[1177521,1.0],[20449160,1.0],[197117628,1.0],[42423692,1.0],[24963961,1.0],[19637934,1.0],[35960001,1.0],[43269420,1.0],[43283406,1.0],[20269113,1.0],[59409413,1.0],[25548759,1.0],[23779324,1.0],[21449197,1.0],[14327149,1.0],[15429316,1.0],[16159485,1.0],[18785846,1.0],[67651295,1.0],[28389815,1.0],[19780922,1.0],[23841181,1.0],[78391198,1.0],[60765383,1.0],[37689397,1.0],[6447142,1.0],[31332871,1.0],[30364057,1.0],[14120151,1.0],[16303064,1.0],[23023236,1.0],[103610974,1.0],[108382988,1.0],[19791811,1.0],[17121755,1.0],[46346811,1.0],[45618045,1.0],[25587721,1.0],[25362775,1.0],[20710218,1.0],[20223138,1.0],[21035409,1.0],[101894425,1.0],[38314814,1.0],[24582667,1.0],[21181713,1.0],[15901190,1.0],[18197299,1.0],[38802447,1.0],[19668592,1.0],[14515734,1.0],[16870853,1.0],[16488614,1.0],[95955871,1.0],[14780915,1.0],[21188490,1.0],[24243022,1.0],[27150723,1.0],[29425265,1.0],[36370563,1.0],[36528126,1.0],[43789332,1.0],[82773533,1.0],[19726043,1.0],[20888549,1.0],[30271564,1.0],[14874125,1.0],[121436823,1.0],[56405314,1.0],[46954727,1.0],[25675498,1.0],[12803352,1.0],[23888081,1.0],[18498684,1.0],[38536306,1.0],[22851295,1.0],[20140595,1.0],[22311506,1.0],[31121729,1.0],[53717630,1.0],[100101137,1.0],[24753205,1.0],[24523660,1.0],[19544133,1.0],[20823773,1.0],[22677790,1.0],[15227791,1.0],[57525419,1.0],[28562317,1.0],[9629222,1.0],[24047612,1.0],[30508215,1.0],[59084417,1.0],[71088774,1.0],[142157505,1.0],[15284851,1.0],[17164788,1.0],[17885166,1.0],[18420140,1.0],[19695929,1.0],[20572844,1.0],[23479429,1.0],[26642006,1.0],[43469093,1.0],[50835878,1.0],[172049453,1.0],[20604508,1.0],[21681591,1.0],[20052907,1.0],[21271938,1.0],[17842661,1.0],[6365162,1.0],[18130749,1.0],[19249062,1.0],[24193336,1.0],[25913173,1.0],[28647246,1.0],[26072121,1.0],[14522546,1.0],[16409683,1.0],[18785475,1.0],[28969818,1.0],[52757166,1.0],[7120172,1.0],[112237392,1.0],[116779546,1.0],[57107167,1.0],[26347170,1.0],[26565946,1.0],[44409004,1.0],[21105244,1.0],[14230524,1.0],[44711134,1.0],[101753075,1.0],[783214,1.0],[22885110,1.0],[39367703,1.0],[23042739,1.0],[682903,1.0],[38082423,1.0],[16194263,1.0],[2425151,1.0],[52544275,1.0],[21380763,1.0],[18948541,1.0],[34954261,1.0],[34848331,1.0],[29245563,1.0],[19499974,1.0],[16089776,1.0],[77040291,1.0],[18197476,1.0],[1704551,1.0],[15002838,1.0],[17428652,1.0],[20702626,1.0],[29049111,1.0],[34004383,1.0],[34900333,1.0],[48156959,1.0],[50906836,1.0],[15742480,1.0],[41073372,1.0],[37338814,1.0],[1344951,1.0],[8320242,1.0],[14719153,1.0],[20822636,1.0],[168841922,1.0],[19877186,1.0],[14681605,1.0],[15033883,1.0],[23121582,1.0],[23670204,1.0],[41466869,1.0],[18753325,1.0],[21358050,1.0],[78132538,1.0],[132386271,1.0],[86194654,1.0],[17225211,1.0],[107179714,1.0],[18785430,1.0],[19408059,1.0],[19671129,1.0],[24347716,1.0],[24444592,1.0],[25873045,1.0],[7871252,1.0],[14138300,1.0],[16873300,1.0],[14546496,1.0],[165964253,1.0],[15529287,1.0],[95956928,1.0],[19404587,1.0],[21506437,1.0],[22832029,1.0],[19542638,1.0],[30827536,1.0],[5748622,1.0],[22757990,1.0],[41259253,1.0],[23738945,1.0],[19030602,1.0],[21410102,1.0],[28206360,1.0],[136411179,1.0],[17499805,1.0],[26107245,1.0],[127311408,1.0],[77023233,1.0],[20448733,1.0],[20683840,1.0],[22482597,1.0],[15485441,1.0],[28220280,1.0],[55351351,1.0],[70942325,1.0],[9763482,1.0],[15732001,1.0],[27750488,1.0],[18286352,1.0],[122216533,1.0],[19562228,1.0],[5380672,1.0],[22293700,1.0],[59974874,1.0],[44455025,1.0],[90420314,1.0],[22657153,1.0],[16660662,1.0],[14583400,1.0],[16689545,1.0],[94242867,1.0],[44527648,1.0],[40366319,1.0],[33616007,1.0],[23438958,1.0],[15317676,1.0],[14075928,1.0],[1978331,1.0],[33347901,1.0],[16570090,1.0],[32347966,1.0],[26671992,1.0],[101907019,1.0],[24986014,1.0],[23235056,1.0],[40001164,1.0],[21891032,1.0],[18139329,1.0],[9648652,1.0],[16105942,1.0],[3004231,1.0],[20762929,1.0],[28061932,1.0],[39513172,1.0],[15012305,1.0],[18349404,1.0],[22196210,1.0],[110509537,1.0],[20318494,1.0],[21816984,1.0],[22456686,1.0],[62290422,1.0],[93472506,0.8660254],[52305889,0.70710677],[67337055,0.70710677],[122768292,0.5],[35060854,0.5],[43289205,0.5],[87271142,0.5],[28096898,0.5],[79297090,0.5],[24016107,0.5],[48736472,0.5],[109982897,0.5],[98367357,0.5],[21816847,0.5],[73129588,0.5],[23807734,0.5],[76724998,0.5],[63153228,0.5],[21628966,0.5],[14465428,0.5],[42609851,0.5],[30213342,0.5],[17021966,0.5],[96616361,0.5],[97546740,0.5],[67613930,0.5],[21234391,0.5],[87245558,0.5],[36841912,0.5]]}

I would be performing lookups on this data structure. What would be the most appropriate data structure to achieve my purpose? I have heard recommendations about Redis. Would it be worth looking into it rather than the traditional python data structure? If not, please suggest other mechanisms.

Edit

The 'value' field is a list of lists. Most cases, the list may be upto 1000 lists consisting of a size-2 list.

like image 254
Dexter Avatar asked Jan 19 '12 09:01

Dexter


Video Answer


2 Answers

Redis would be appropriate if...

  • You want to share the queue between multiple processes or instances of your app.
  • You want the data to be persistent, so if your app goes down it can pick up where it left off.
  • You want a super fast, easy solution.
  • Memory usage is a concern.

I'm not sure on the last one, but I'm guessing using dict or some other collection type in Python is likely to have a higher memory footprint than storing all your key/values in a single Redis hash.

update

I tested the memory usage by storing the example array 1 million times both in memory and in redis. Storing all the values in a Redis hash requires serializing the array. I chose json serialization, but this could have easily been a more efficient binary format, which redis supports.

  • 1 million copies of the array provided in a Ruby Hash (should be comparable to Python's dict) indexed using an integer key similar to the one shown. Memory usage increased by ~350mb (similar to the python results by @gnibbler).
  • 1 million copies of the array, serialized to a JSON string in a redis hash indexed using the same numbers. Memory usage increased by ~250mb.

Both were very fast, with the Redis being slightly faster when I measured 10,000 random lookups vs random lookups against the native collection. I know it's not Python, but this should be at least illustrative.

Also, to answer the OPs other concern, Redis has no trouble handing very large string values. It's max string size is currently 512mb

like image 113
Carl Zulauf Avatar answered Sep 27 '22 17:09

Carl Zulauf


Really shouldn't be a problem

>>> d=dict((str(n), range(20)) for n in range(1000000))

took ~350MB to create. Your keys/values may be much larger of course

like image 22
John La Rooy Avatar answered Sep 27 '22 16:09

John La Rooy