Solved: How to Summarize a count by Day over the last 30 d...

_gkollias · ‎10-04-2014

Hi All,

I am looking for duplicate invoices, and have created a search which gives me the total list. However, I would like to summarize the data to show total counts of dupes per day over the last 30 days. Here is my initial search:

index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice earliest=-30d@d
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}"
| stats count earliest(_time) as start_time, latest(_time) as end_time,  first(svc_context_name) as svc_context_name by batch_id
| eval _time=start_time 
| eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
| sort _time
| search NOT batch_id=PO81*
| where count > 1
| table start_time, count,svc_context_name, batch_id
| dedup batch_id

Is there a way to summarize this list? Maybe using the Splunk field "date_wday"?

Thank you in advance!

sideview · ‎10-04-2014

Total batch_id's that show duplicates, per day, over the last 30 days. Split by svc_context_name

index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice 
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
 | stats count earliest(_time) as start_time, latest(_time) as end_time,  first(svc_context_name) as svc_context_name by batch_id
 | eval _time=start_time 
 | eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
 | search NOT batch_id=PO81*
 | where count > 1
 | timechart span=1d count by svc_context_name

If you want to do the same but count total duplicates across all batch_ids, we change "count" to "sum(count) as count)". and we also have to subtract one from all the counts, because if there are N total events for a batch_id, only N-1 are strictly speaking "duplicates"

index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice 
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
 | stats count earliest(_time) as start_time, latest(_time) as end_time,  first(svc_context_name) as svc_context_name by batch_id
 | eval _time=start_time 
 | eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
 | search NOT batch_id=PO81*
 | where count > 1
 | eval numberOfDuplicates=count-1
 | timechart span=1d sum(numberOfDuplicates) as count by svc_context_name

And going back to the "total batch_ids showing any duplicates" version, we can use date_wday instead. Here we use date_wday to make the x-axis "day of week" instead of just the 30 days spread out linearly.

index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice 
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
 | stats count earliest(_time) as start_time, latest(_time) as end_time,  first(svc_context_name) as svc_context_name by batch_id
 | eval _time=start_time 
 | eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
 | search NOT batch_id=PO81*
 | where count > 1
 | chart count over date_wday by svc_context_name

You probably just want timechart though. span=1d option and all the other goodies are documented here:
http://docs.splunk.com/Documentation/Splunk/6.1.3/SearchReference/Timechart

View solution in original post

sideview · ‎10-04-2014

Total batch_id's that show duplicates, per day, over the last 30 days. Split by svc_context_name

index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice 
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
 | stats count earliest(_time) as start_time, latest(_time) as end_time,  first(svc_context_name) as svc_context_name by batch_id
 | eval _time=start_time 
 | eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
 | search NOT batch_id=PO81*
 | where count > 1
 | timechart span=1d count by svc_context_name

If you want to do the same but count total duplicates across all batch_ids, we change "count" to "sum(count) as count)". and we also have to subtract one from all the counts, because if there are N total events for a batch_id, only N-1 are strictly speaking "duplicates"

index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice 
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
 | stats count earliest(_time) as start_time, latest(_time) as end_time,  first(svc_context_name) as svc_context_name by batch_id
 | eval _time=start_time 
 | eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
 | search NOT batch_id=PO81*
 | where count > 1
 | eval numberOfDuplicates=count-1
 | timechart span=1d sum(numberOfDuplicates) as count by svc_context_name

And going back to the "total batch_ids showing any duplicates" version, we can use date_wday instead. Here we use date_wday to make the x-axis "day of week" instead of just the 30 days spread out linearly.

index=contract_gateway sourcetype=esb_audit svc_context_name= bp_bp_name=Invoice status=START PublishInvoice 
punct="--::._-,,{=..},{|||....|||,||.:|----|||},{||....|}" earliest=-30d@d
 | stats count earliest(_time) as start_time, latest(_time) as end_time,  first(svc_context_name) as svc_context_name by batch_id
 | eval _time=start_time 
 | eval start_time=strftime(start_time,"%m/%d/%Y %H:%M:%S")
 | search NOT batch_id=PO81*
 | where count > 1
 | chart count over date_wday by svc_context_name

You probably just want timechart though. span=1d option and all the other goodies are documented here:
http://docs.splunk.com/Documentation/Splunk/6.1.3/SearchReference/Timechart

_gkollias · ‎10-04-2014

Thank you!

How to Summarize a count by Day over the last 30 days

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life