Currently I am using solr 6 and I want to index log data like those shown below:
2016-06-22T03:00:04Z|INFO|ip-10-11-0-241|1301|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter|Invalid UserAgent=%E3%83%94%E3%82%B3/1.07.41149 CFNetwork/758.2.8 Darwin/15.0.0, PlayerId=player_a2a7d1a4-0a31-4c4d-b5bf-10be67dc85d6|
I am unsure how to separate the data via pipe. the layout I use in Nlog is this.
${date:universalTime=True:format=yyyy-MM-ddTHH\:mm\:ssZ}|${level:uppercase=true}|${machinename}|${processid}|${logger}|${callsite:className=true:methodName=true}|${message}|${exception:format=tostring}${newline}
And I tried to use CSV upload but solr gives me the below json return. Not conductive to do queries. Please help
"responseHeader":{
"status":0,
"QTime":77,
"params":{
"q":"*:*",
"indent":"on",
"wt":"json",
"_":"1466745065000"}},
"response":{"numFound":8,"start":0,"docs":[
{
"id":"b28049bb-d49e-4b4d-80db-d7d77351527b",
"2016-06-23T02_37_18Z_INFO_web.chubi.development1_6326_DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider_DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter_Invalid_UserAgent_PIKO_0.00.41269_CFNetwork_711.5.6_Darwin_14.0.0":["2016-06-23T02:37:28Z|INFO|web.chubi.development1|6326|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter|Invalid UserAgent=PIKO/0.00.41269 CFNetwork/711.5.6 Darwin/14.0.0"],
"_PlayerId_player_407defcf-7032-4ef4-81a6-91bb62b9150b_":[" PlayerId=player_905266b2-9ce3-4fa1-b0a7-4663b9509731|"],
"_version_":1537919142165741568}]}
Looks like you want to extract Clean data out of the logs that can be indexed and searched without any ambiguity. Why don't you try to analyze your data using creating a custom Analyzer
that uses a Regex for filtering out the data for you. I would strongly suggest solr.PatternTokenizerFactory
to remove pipe character from your Text . Also , you can use Analysis tab in solr for an exhaustive analysis that how your log data has been treated by Analyzer . For the encoded text, like in Invalid UserAgent field you can use ASCII Folding filter factory
for indexing encoded characters . And you may need to tokenize data at dots also, i don't know whether that's your requirement or not . In your data, PatternTokenizer does the trick, and if you still need to do further refinements , you may use solr.WordDelimeter to tune your index better . May be I'll edit this solution with some Analyzer settings for you :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With