Hi All,
I am looking for duplicate invoices, and have created a search which gives me the total list. However, I would like to summarize the data to show total counts of dupes per day over the last 30 days. Here is my initial search:
index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice earliest=-30d@d
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}"
| stats count earliest(_time) as start_time, latest(_time) as end_time, first(svc_context_name) as svc_context_name by batch_id
| eval _time=start_time
| eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
| sort _time
| search NOT batch_id=PO81*
| where count > 1
| table start_time, count,svc_context_name, batch_id
| dedup batch_id
Is there a way to summarize this list? Maybe using the Splunk field "date_wday"?
Thank you in advance!
Total batch_id's that show duplicates, per day, over the last 30 days. Split by svc_context_name
index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
| stats count earliest(_time) as start_time, latest(_time) as end_time, first(svc_context_name) as svc_context_name by batch_id
| eval _time=start_time
| eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
| search NOT batch_id=PO81*
| where count > 1
| timechart span=1d count by svc_context_name
If you want to do the same but count total duplicates across all batch_ids, we change "count" to "sum(count) as count)". and we also have to subtract one from all the counts, because if there are N total events for a batch_id, only N-1 are strictly speaking "duplicates"
index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
| stats count earliest(_time) as start_time, latest(_time) as end_time, first(svc_context_name) as svc_context_name by batch_id
| eval _time=start_time
| eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
| search NOT batch_id=PO81*
| where count > 1
| eval numberOfDuplicates=count-1
| timechart span=1d sum(numberOfDuplicates) as count by svc_context_name
And going back to the "total batch_ids showing any duplicates" version, we can use date_wday instead. Here we use date_wday to make the x-axis "day of week" instead of just the 30 days spread out linearly.
index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
| stats count earliest(_time) as start_time, latest(_time) as end_time, first(svc_context_name) as svc_context_name by batch_id
| eval _time=start_time
| eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
| search NOT batch_id=PO81*
| where count > 1
| chart count over date_wday by svc_context_name
You probably just want timechart though. span=1d option and all the other goodies are documented here:
http://docs.splunk.com/Documentation/Splunk/6.1.3/SearchReference/Timechart
Total batch_id's that show duplicates, per day, over the last 30 days. Split by svc_context_name
index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
| stats count earliest(_time) as start_time, latest(_time) as end_time, first(svc_context_name) as svc_context_name by batch_id
| eval _time=start_time
| eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
| search NOT batch_id=PO81*
| where count > 1
| timechart span=1d count by svc_context_name
If you want to do the same but count total duplicates across all batch_ids, we change "count" to "sum(count) as count)". and we also have to subtract one from all the counts, because if there are N total events for a batch_id, only N-1 are strictly speaking "duplicates"
index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
| stats count earliest(_time) as start_time, latest(_time) as end_time, first(svc_context_name) as svc_context_name by batch_id
| eval _time=start_time
| eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
| search NOT batch_id=PO81*
| where count > 1
| eval numberOfDuplicates=count-1
| timechart span=1d sum(numberOfDuplicates) as count by svc_context_name
And going back to the "total batch_ids showing any duplicates" version, we can use date_wday instead. Here we use date_wday to make the x-axis "day of week" instead of just the 30 days spread out linearly.
index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
| stats count earliest(_time) as start_time, latest(_time) as end_time, first(svc_context_name) as svc_context_name by batch_id
| eval _time=start_time
| eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
| search NOT batch_id=PO81*
| where count > 1
| chart count over date_wday by svc_context_name
You probably just want timechart though. span=1d option and all the other goodies are documented here:
http://docs.splunk.com/Documentation/Splunk/6.1.3/SearchReference/Timechart
Thank you!