Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing and mapping log data using solr 6

Tags:

solr6

Currently I am using solr 6 and I want to index log data like those shown below:

2016-06-22T03:00:04Z|INFO|ip-10-11-0-241|1301|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter|Invalid UserAgent=%E3%83%94%E3%82%B3/1.07.41149 CFNetwork/758.2.8 Darwin/15.0.0, PlayerId=player_a2a7d1a4-0a31-4c4d-b5bf-10be67dc85d6|

I am unsure how to separate the data via pipe. the layout I use in Nlog is this.

${date:universalTime=True:format=yyyy-MM-ddTHH\:mm\:ssZ}|${level:uppercase=true}|${machinename}|${processid}|${logger}|${callsite:className=true:methodName=true}|${message}|${exception:format=tostring}${newline}

And I tried to use CSV upload but solr gives me the below json return. Not conductive to do queries. Please help

  "responseHeader":{
    "status":0,
    "QTime":77,
    "params":{
      "q":"*:*",
      "indent":"on",
      "wt":"json",
      "_":"1466745065000"}},
  "response":{"numFound":8,"start":0,"docs":[
      {
        "id":"b28049bb-d49e-4b4d-80db-d7d77351527b",
        "2016-06-23T02_37_18Z_INFO_web.chubi.development1_6326_DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider_DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter_Invalid_UserAgent_PIKO_0.00.41269_CFNetwork_711.5.6_Darwin_14.0.0":["2016-06-23T02:37:28Z|INFO|web.chubi.development1|6326|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter|Invalid UserAgent=PIKO/0.00.41269 CFNetwork/711.5.6 Darwin/14.0.0"],
        "_PlayerId_player_407defcf-7032-4ef4-81a6-91bb62b9150b_":[" PlayerId=player_905266b2-9ce3-4fa1-b0a7-4663b9509731|"],
        "_version_":1537919142165741568}]}
like image 665
Moses Liao GZ Avatar asked Jun 22 '16 03:06

Moses Liao GZ


1 Answers

Looks like you want to extract Clean data out of the logs that can be indexed and searched without any ambiguity. Why don't you try to analyze your data using creating a custom Analyzer that uses a Regex for filtering out the data for you. I would strongly suggest solr.PatternTokenizerFactory to remove pipe character from your Text . Also , you can use Analysis tab in solr for an exhaustive analysis that how your log data has been treated by Analyzer . For the encoded text, like in Invalid UserAgent field you can use ASCII Folding filter factory for indexing encoded characters . And you may need to tokenize data at dots also, i don't know whether that's your requirement or not . In your data, PatternTokenizer does the trick, and if you still need to do further refinements , you may use solr.WordDelimeter to tune your index better . May be I'll edit this solution with some Analyzer settings for you :)

like image 195
Saurabh Chaturvedi Avatar answered Oct 31 '22 21:10

Saurabh Chaturvedi