Hi everyone. I am trying to parse SIP dialogs using splunk.
Inside the dialog messages, there are TO and FROM lines. They can appear in any of the following formats:
What I am looking to get out of this is the content within the angle brackets. In cases where it is in the format of a URI (somestring @ someotherstring), I only want somestring to be captured. So for the first four examples 18005551212/unavailable/anonymous would be returned.
In the case where the format is just a FQDN, I need the full fqdn captured.
Right now I am using this methodology:
|rex field=_raw "From:.*\(LESSTHAN)sip:\+?(?(LESSTHAN)FROM_NUM(GREATERTHAN).*)@.*\(GREATERTHAN).*"
|rex field=_raw "From:.*\(LESSTHAN)sip:\+?(?(LESSTHAN)FROM_DOM(GREATERTHAN)(\w+\.)+\w+);?.*\(GREATERTHAN).*"
|eval FROM_FIELD=coalesce(FROM_NUM,FROM_DOM)
(Please replace the (GREATERTHAN) and (LESSTHAN) with the correct angle brackets since the splunk answers parser has trouble understanding them)
This works but I'm hoping there is a cleaner way to do this without needing multiple regexes and an eval statement for what is essentially the same field. Any suggestions would be very much appreciated - as it stands I'll have to do 6 different operations to get the values of just 2 fields which is rather expensive processing-wise.
off the top of my head :
| rex "From:.*?<sip[:+]+(?<FROM_FIELD>[\w.]+)"
so thats :
"From:" followed by any number of characters up to the 1st "<sip"
Then 1 or more of : or +
Then we capture as many word characters (A-Z a-z 0-9 _ ) or "." as we can and put it in the FROM_FIELD