I would like to search for keywords( mentioned below ) from the logs and create a report in the format shown
Every keyword has different pattern and it lies in middle of requests which start with ?pyActivity=
?ptActivity=...............................................PreActivity=DCBClaimSearch&HeaderButtonSectionName.................HTTP/1.1" 200 4502
?
ptActivity=...........................LanguageCode=&CountryCode=&PRODUCT_XXXX=XXXX=&LOB=&XXXXXCD=&Count=..........HTTP/1.1" 200 3402
?ptActivity=xxxxxxxxxxxxxxxxxxxxxxxxxxxxx%20ℜquest_Type=&xxxxxxxxxxxxxx_xxxxxxxxxxxxxxx&ELEMENT_CD=⟪uageCode=&CountryCode=&PRODUCT_LINE_CDXXXX=&LOB=&LOB_XXX_CD=&Count= HTTP/1.1" 200 5092
log format :
1x.xx.xxx.xxx - - 11xxxxx4 [03/Oct/2017:08:01:54 -0400] - /pxxx/Gxxxxt/uxxxxxxxxx4[/!TABTHREAD1 HTTP/1.1 oxxx-xxx.xxx.net TIME:0/123717 "POST /pxxxb/Gxxxxt/uxxxxxxxxxxxxxxxxx4%5B/!TABTHREAD1?ptActivity=Cxxxxxxxxx-xxxx.xxxxxx%20&Request_Type=&xxxxxTYPE_CD=COUNTRY&Exxxxxxxx_CD=&LanguageCode=&CountryCode=&PRODUCT_LINE_CD=®ION_CD=&LOB=&LOB_SUB_CD=&Count= HTTP/1.1" 200 4011
1x.xx.xxx.xxx - - - [03/Oct/2017:08:01:54 -0400] - /pddddb/Gdddd/xxxxxxxxxxxxxxxxxx[/themeimages/h1expand_theme_ccddd.gif!!.gif HTTP/1.1 oxxxxxxxxxxx.aig.net TIME:0/12758 "GET / /pddddb/Gdddd/xxxxxxxxxxxxxxxxxx[/themeimages/h1expand_theme_ccddd.gif!!.gif HTTP/1.1" 200 69
1x.xx.xxx.xxx- - 1ssssss4 [03/Oct/2017:08:02:09 -0400] - /pxxxx/Gxxxxxt/uxxxxxxxxxxxxxxxxx4[/!TABTHREAD1 HTTP/1.1 oxxx-xxx.xx.net TIME:0/117091 "POST /pxxxb/Gxxxt/xxxxxxxxxxxxxxxxxxxxB/!TABTHREAD1?ptActivity=ReloadSection&pzIxxxd=xxxxxxxxxxxxxxxxxxx&pzFromFrame=pyxxxx&pzxxxxxxxxxxxe=pyxxxxxxxxe&pzxxxxxxx=false&StreamName=AddPropertyDetails&BaseReference=xxxxxxxxxx.xxxxxxxxxxe.Prxxxxxxx&Stxxxxxxxxxxxss=xxxxxxx-Section&bClientValidation=true&FieldError=ERRORTEXT&PreActivity=&xxxxxxxxxge=true&HexxxxxxxxnName=SubxxxxxxorkObjectHeaderB&inStandardsMode=true&AJAXTrackID=5&pzHarnessID=HIDxxxxxxxxx HTTP/1.1" 200 4512
reports to be generated:
Report 1 :
User Time Protocol server Elapsed Time (Seconds) Call Status Size logName
1ssssss4 17/Oct/04 01:15:00 HTTP/1.1 oxxxxxxxxxxx.net 0.201185 ptActivity=ReloadSection&pzIxxxd=xxxxxxxxxxxxxxxxxxx&pzFromFrame=pyxxxx&pzxxxxxxxxxxxe=pyxxxxxxxxe&pzxxxxxxx=false&StreamName=AddPropertyDetails&BaseReference=xxxxxxxxxx.xxxxxxxxxxe.Prxxxxxxx&Stxxxxxxxxxxxss=xxxxxxx-Section&bClientValidation=true&FieldError=ERRORTEXT&PreActivity=&xxxxxxxxxge=true&HexxxxxxxxnName=SubxxxxxxorkObjectHeaderB&inStandardsMode=true&AJAXTrackID=5&pzHarnessID=HIDxxxxxxxxxx HTTP/1.1 200 6188 \508\access_log_10_04_2017
This seems to work for me on those two events. Except for the elapsed bit, still not sure how that is being calculated. Also, in your second example event, the first dash (-) is right up against the IP. I'm assuming there is actually a space there like the first event.
... | rex "^(?<ip>\S+)(?:\s+\S+){2}\s+(?<user>\S+)\s+\[(?<time>[^\]]+)\](?:\s+\S+){2}\s+(?<protocol>\S+)\s+(?<server>\S+)\s+(?<elapsed>\S+)\s+\"(?<request>[^\"]+)\"\s+(?<status>\d+)\s+(?<bytes>\S+)"
| rex field=request "ptActivity=(?<call>.+)$"
| table user,time,protocol,server,call,status,bytes,source
If this is what you want, you could also put these field extractions in props.conf for whatever sourcetype you all this on your search. That way the fields will automatically be extracted for you. So, you wouldn't need to use the rex commands to create them.
This seems to work for me on those two events. Except for the elapsed bit, still not sure how that is being calculated. Also, in your second example event, the first dash (-) is right up against the IP. I'm assuming there is actually a space there like the first event.
... | rex "^(?<ip>\S+)(?:\s+\S+){2}\s+(?<user>\S+)\s+\[(?<time>[^\]]+)\](?:\s+\S+){2}\s+(?<protocol>\S+)\s+(?<server>\S+)\s+(?<elapsed>\S+)\s+\"(?<request>[^\"]+)\"\s+(?<status>\d+)\s+(?<bytes>\S+)"
| rex field=request "ptActivity=(?<call>.+)$"
| table user,time,protocol,server,call,status,bytes,source
If this is what you want, you could also put these field extractions in props.conf for whatever sourcetype you all this on your search. That way the fields will automatically be extracted for you. So, you wouldn't need to use the rex commands to create them.
Thanks for your previous answer , I also need a reports like below,
Report2: summary report
Start Time End Time keyword Total # of executions Avg # of Executions per Hour Min Resp Time Max Resp Time Avg Resp Time 90th percentile Resp Time Std Dev Of Resp Time Min Size of Response Max Size of Response Avg Size of Response 90th percentile Size of Response Std Dev of Size of Response
sample:
Keyword1 17/Oct/04 00:11:46 17/Oct/04 23:24:05 2398 104 0.02 27.35 0.108 0.109 0.594 82 10342 4302.94 4543 424.21
Keyword2 17/Oct/04 00:11:46 17/Oct/04 23:24:05 2398 103 0.03 22.35 0.119 0.107 0.583 89 10332 43394 4523 4324.21
Report3: 24 hours
Start Time End Time keyword Total # of executions Avg # of Executions per Hour Min Resp Time Max Resp Time Avg Resp Time 90th percentile Resp Time Std Dev Of Resp Time Min Size of Response Max Size of Response Avg Size of Response 90th percentile Size of Response Std Dev of Size of Response
sample:
keyword_1 17/Oct/04 00:00:00 17/Oct/04 00:59:59 4 4 0.056125 0.070999 0.0613225 0.070999 0.00671778 3617 4533 3886.75 4533 437.5083809
keyword_1 17/Oct/04 01:00:00 17/Oct/04 01:59:59 3 3 0.058215 0.080105 0.066264 0.080105 0.012039662 3780 4548 4036 4548 443.4050067
keyword_1 17/Oct/04 02:00:00 17/Oct/04 02:59:59 9 9 0.039571 0.083275 0.058887778 0.083275 0.015465193 3628 4549 4018.777778 4549 400.1539634
keyword_1 17/Oct/04 03:00:00 17/Oct/04 03:59:59 8 8 0.038187 0.062873 0.053408625 0.062873 0.009202517 3615 4545 3834 4545 296.6532367
.
.
keyword_1 17/Oct/04 23:00:00 17/Oct/04 23:59:59 5 5 0.040078 0.07862 0.0598834 0.07862 0.013636071 3616 3628 3618.6 3628 5.272570531
similarly for keyword_2, _3 and soon.
Note My log format is same as shown in question initially.
Are you familiar with the stats command? If not, you might want to play around with it. I'm not going to type all of these out but hopefully this will give you the right idea
... | rex "^(?\S+)(?:\s+\S+){2}\s+(?\S+)\s+\[(?[^\]]+)\](?:\s+\S+){2}\s+(?\S+)\s+(?\S+)\s+(?\S+)\s+\"(?[^\"]+)\"\s+(?\d+)\s+(?\S+)"
| rex field=request "ptActivity=(?<call>.+)$"
| bucket _time span=1h
| stats count min(elapsed) as min_resp, max(elapsed) as max_resp, min(bytes) as min_size, max(bytes) as max_size by call _time
| stats sum(count) as total_events, avg(count) as avg_per_hour, min(min_resp) as min_resp, max(max_resp) as max_resp, min(min_size) as min_size, max(max_size) as max_size by call
The bucket command will essentially floor all of the timestamps to the hour. Next we get all of our stats by the keyword and hour, because we need to calculate avg events per hour. Now that we have those counts by the hour/keyword, we can get the average per hour and then all of the remaining numbers grouped to just the keyword with another stats command.
Hopefully that helps. Note, you could pretty much eliminate that last step for your last report, since it looks like you do want the data by hour. And in that case, I'm guessing total events and avg per hour would be the same number, if you're aggregating over the hour.
It worked fine thanks.
If I have a set of keywords for which I need to obtained the above results. Is there a way in splunk I can automate to read the csv files for each keyword one at a time and generate the output in the format shown above.
If the csv file is a lookup in splunk, then that could be doable using a subsearch I believe, but you'd probably want to create field extractions in props for that field, as opposed to using rex in the command.
Or if it's a pretty static list, you can just filter for those keywords in the base search too. Or you could create a dashboard with a dropdown of keywords and have the search update as you select a different keyword.
So you have a few different options, but as far as just "looping through a csv", that's not really how splunk works, no.
yes ur correct we need to convert timestamp to format (start_time,"%d/%m/%Y %I:%M:%S:%p") .
Yes ur right
have you done any of these field extractions yet? If not, can you share the log format? We might be able to guess from the examples, but i see some instances where the user is a "-". But then there are also other dashes that probably represent something?
Also, is the timestamp in splunk for these events (_time) the same as the timestamp in the event?
timestamp in splunk you mean to say the format?
I meant timestamp. I was just wondering if these events in splunk already have the correct timestamp or fi that needs to be extracted as well.
I won't have time tonight, but I can try put together the regex to pull the fields out of this data so you can create the report. I think that's really all you need, right?
And if this is a common format like apache or something, then there is already probably an add-on that knows how to parse the events.
yes ur correct we need to convert timestamp to format (start_time,"%d/%m/%Y %I:%M:%S:%p") .
Yes ur right
The log format is typically same as shown above
- - 11xxxxx4 is same for all the lines
- - - only for static values like css, js, img, the user column is (-).
when i ask about the log format, this is what i have in mind. Not an example of the log, but exactly what parameters are being used to create the log
http://httpd.apache.org/docs/current/mod/mod_log_config.html
Also, not sure how you are calculating elapsed time with those examples. Does it come from this part: "TIME:0/117091" ??