cisco_ironport_web.log has the following events -
Event - 1
1489714117.601 56 27.1.11.11 TCP_REFRESH_HIT/200 54491 GET http://www.flatbed-scanner-review.org/inter-banner_flatbed.jpg bhussain@buttercupgames.com DIRECT/www.flatbed-scanner-review.org image/jpeg DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <nc,ns,0,-,-,-,-,0,-,-,-,-,-,-,-,nc,-> - http://www.flatbed-scanner-review.org/
Event - 2
1489713615.376 809 211.166.11.101 TCP_MISS/200 147639 GET http://www.vindy.com/ myuan@buttercupgames.com DIRECT/www.vindy.com text/html DEFAULT_CASE-DefaultGroup-Demo_Clients-NONE-NONE-DefaultRouting <IW_news,3.4,0,-,-,-,-,0,-,-,-,-,-,-,-,IW_news,-> - -
I use the following reg-ex to extract user, url and domain
"field1","field2","field3","field4","field5","field6","url","user","field9","field10","field11","field12","field13","domain"
It doesn't work for second event, because domain fields has '-'. How do I fix it?
Hi jagadeeshm,
try
(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)
you can test it at https://regex101.com/r/1qW58r/1
Bye.
Giuseppe
Instead of re-inventing the wheel, you could take some inspiration from Splunk Add-on for Cisco WSA
https://splunkbase.splunk.com/app/1747/
If I look at the sample data and props/transforms in that TA it seems to support very similar data to what you have. The regex in there does not perfectly match (the part between <...>
is giving some issues I think), but might be a good start.
Hi jagadeeshm,
try
(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){4}(?<domain>[^ ]*)
you can test it at https://regex101.com/r/1qW58r/1
Bye.
Giuseppe
It doesn't actually extract domain name, which is my core issue.
I tried the regex101 link, it extracts the domain field at the very end. That field is not always populated So I tried to extract the domain from the string right after "DIRECT/". This would be my solution. But only if you are not looking at the field at the end.
(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]){6}\/(?<domain>[^ ]*)
Hi jagadeeshm,
sorry correct 5 instead 4 (see https://regex101.com/r/1qW58r/2)
(GET|POST)\s(?<url>[^ ]*)\s(?<user>[^ ]*)\s([^ ]*\s){5}(?<domain>[^ ]*)
Bye.
Giuseppe