Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing data from mysql to druid

Tags:

mysql

druid

I have been using mysql for all my data storage and querying. But as now the tables sizes has become so much high, it takes sometimes hours to get the results. I have taken every possible measures like optimizing queries and indexing the tables properly.

So I have been thinking of using druid. I have not worked with druid earlier. I think I have to import all the tables, data from mysql to druid. I cannot understand where should I start. So if anyone kindly helps me with any kind of guidance, I will be really grateful. Thanks in advance.

like image 345
Joy Avatar asked Jan 17 '14 08:01

Joy


2 Answers

First note that Druid ingests timeseries data, so each row of your data will have to have a timestamp. If that's possible, read on.

Output your data to CSV or TSV. Those are two of the formats supported for batch ingestion. So your data will look something like this:

2013-08-31T01:02:33Z,"someData","true","true","false","false",57,200,-143
2013-08-31T03:32:45Z,"moreData","false","true","true","false",459,129,330
...

Then you can create an index task which has a firehose section where you specify the location of the file, format, and columns:

"firehose" : {
   "type" : "local",
   "baseDir" : "my/directory/",
   "filter" : "my.csv",
   "parser" : {
     "timestampSpec" : {
       "column" : "timestamp"
     },
     "data" : {
       "type" : "csv",
       "columns" : ["timestamp","data1","data2","data3",...,"datan"],
       "dimensions" : ["data1","data2","data3",...,"datan"]
     }
   }
 }

Note the special handling given to the timestamp column.

Now run the indexing service (the Druid docs contain info on how to start the cluster you'll need) and feed the task to it as described in the section Batch Ingestion Using the Indexing Service. The data will be ingested and processed into Druid segments that you can query.

like image 143
user766353 Avatar answered Sep 22 '22 12:09

user766353


first of all, i dont's think druid work better as a storage, it's more exact when we say that it's a realtime query and process tool. moreover, i prefer to use hive+hdfs combo to replace your mysql since the size of your tables increasing. What's more, druid can take hdfs as its deep storage

like image 42
shen Avatar answered Sep 22 '22 12:09

shen