I'm using Splunk to parse some logs that have our "hub" and "comp" IDs embedded in them, down in the body of the message. I need to use a field extraction RegEx to pull them out in the form: HHHH-CCCC where the data appears like this:
Hub:[HHHH] Comp: [HHHH]
Here's an example record:
RecordID:[00UJ9ANUHO5551212] TrackingID:[1234ANUHO5551212] Hub:[0472] Comp:[N259] Some event occurred, the log is in here::[\server\share\0472\N258\blah\blah\blah\somefile.txt], No exceptions raised.
From that, I'd like to return:
0472-N259
I'm trying to learn (re-learn! I learned this stuff 30 years ago!) capturing groups, and came up with this:
(?<=Hub:\[)([A-Z0-9]{4})
From that I can get the 4 characters for the hub, but it won't let me do:
(?<=Hub:\[)([A-Z0-9]{4}) (?<=Comp:\[)([A-Z0-9]{4})
I'm kind of close, but am getting frustrated and it's time to go home, so I thought maybe SO could help me out overnight. 100 bounty for the best answer (please explain the solution). I promise to come back and award when this question is eligible. Answer doesn't have to be in splunk form (with <fieldname>
) but that's helpful too.
It's helpful if the RegEx can be pasted into http://gskinner.com/RegExr/ so I can experiment further.
Theres two ways you can achieve what you're looking to do...
Using search
Extract the fields with rex
and use eval
to concatenate the values.
| rex field=_raw "Hub:\[(?<Hub>[^\]]*)\]\sComp:\[(?<Comp>[^\]]*)\]" | eval someNewField=Hub."-".Comp
The rex command allows you to run a regular expression against a field, _raw
is a special field name that contains the entire event data. The regex itself captures any characters between [
and ]
and extracts it to the field named within the <>
.
This is the easiest way as you don't need to modify any configuration to do it, but the drawback is that you'll need to add this to your search string to get the values extracted and formatting the way you want.
Using search time extraction with prop.conf
and transforms.conf
In transforms.conf
, add a transform to extract the fields...
[hubCompExtract]
REGEX = Hub:\[(?<Hub>[^\]]*)\]\sComp:\[(?<Comp>[^\]]*)\]
In props.conf
, execute the extract and concatenate the values using an eval...
[yourSourceTypeName]
REPORT-fieldExtract = hubCompExtract
EVAL-yourNewFieldName = Hub."-".Comp
No need to add anything to your search string, but it does require config file changes.
Regex example
gSkinner example (without the capture group names).
I'm not familiar with splunk, but I suppose the regexp support named grouping.
To create fully proper regexp I need to couple things
Hub:[HHHH] Comp:[CCCC]
? Always Hub, single space then Comp?*
?This is my regex: Hub:\s*\[(?<Hub>.{4})\]\s+Comp:\s*\[(?<Comp>.{4})\]
And sample in C# (assuming str
variable contains line with one record)
var regEx = new Regex(@"Hub:\s*\[(?<Hub>.{4})\]\s+Comp:\s*\[(?<Comp>.{4})\]");
var m = regEx.Match(str);
Console.WriteLine(String.Format("{0}-{1}", m.Groups["Hub"], m.Groups["Comp"]));
Explanation:
If you want to use Match, you don't care about nothing but your IDs, so you don't need to put anything to parenthesis, except IDs. To easy locate them, we use named grouping (?<someName>pattern)
Assuming there are always 4 characters of IDs, we use {4}
. Any characters - so .{4}
.
If you want to ensure there is only letters and numbers, you can change it to [A-Z0-9]{4}
.
If you don't know how many letters/numbers will be, you could change {4}
to +
- this is the same as {1,}
(from 1 to infinity)
When you posting example, you place extra space between colon and bracket, so I place :\s*\[
.
This means it could be :[
, : [
or any other white space in any repetition.
Assuming that Comp
is place just after closing bracket of Hub: \]\s+Comp
- one or more white space between them.
FYI: If you planning to use is for replace method add at the beginning and at the end .*
, meaning anything else.
var regEx = new Regex(@".*Hub:\s*\[(?<Hub>.{4})\]\s+Comp:\s*\[(?<Comp>.{4})\].*");
Console.WriteLine(regEx.Replace(str, @"${Hub}-${Comp}"));
But using replace method instead of match may cause unpredictable results: when the string has no match with pattern the output string are the same as input. So in cases like this (when extracting some values) use always "Match" method
You were close. Try capturing you targets:
Hub:\[([A-Z0-9]{4}) Comp:\[([A-Z0-9]{4})
Then use groups in your output:
$1-$2
Note that I am unfamiliar with splunk, so the syntax for groups may be the backslashes variety, ie \1-\2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With