Hi
I'm trying to join data from two sourcetypes and make some simple statistics based on joined data. But when I'm using stats I have some incorrect results. Let me explain in details:
In sourcetype "ATM" I have field DEVICE_ID and in sourcetype "Zabbix" I have fields JNAME and AVAILABLE. Fields DEVICE_ID and JNAME contains the same sort of data from two different applications and fields AVAILABLE contain information for which I need some statistics. In sourcetype "ATM" I have events for more DEVICE_ID's than in sourcetype "Zabbix"
Basically my task is to count for how much DEVICE_ID's I don't have information in "Zabbix" every day.
I used following query to get the result:
sourcetype=ATM | eval date=strftime(_time,"%d-%m-%Y") | dedup DEVICE_ID, date
| join type=left DEVICE_ID, data [search sourcetype=Zabbix | eval date=strftime(_time,"%d-%m-%Y") | dedup JNAME, date | rename JNAME AS DEVICE_ID]
| fillnull value="No data" AVAILABLE
| search AVAILABLE="No data"
| stats count(DEVICE_ID) by date
When I'm executing this search for 7 days, I get correct information for latest 5 and fully incorrect for two first. When I executed this search for last 30 days I got correct result for latest 11 days and incorrect for all other. Like this:
date | count
11-09-2014 | 2994
12-09-2014 | 2989
13-09-2014 | 347
14-09-2014 | 328
15-09-2014 | 341
16-09-2014 | 349
Please help me to make this search working properly. Thank you in advance.
P.S. I'm using Splunk Free if it matter
A couple of things:
You might be able to refactor the search such that a join is not needed, which could greatly increase the efficiency of the search. Would a search like this accomplish what you need?
sourcetype=ATM OR sourcetype=Zabbix
| eval date=strftime(_time,"%d-%m-%Y")
| rename JNAME AS DEVICE_ID
| fillnull value="No data" AVAILABLE
| fields DEVICE_ID sourcetype date
| search AVAILABLE="No data"
| stats count(Device_ID) as DeviceCount dc(sourcetype) as SourceTypeCount by date
| where SourceTypeCount>1
Thanks for your help, but it's still not working. I tried to analyze search results without stats
and where
and found following:
1) after | search AVAILABLE="No data"
I have events only of one sourcetype - ATM
2) I see DEVICE_ID field in fields list but when I'm trying to do simple stats | stats count(Device_ID) by date
search provides zero results each day
Any ideas why this can happen and what to do?
Ah, that's because I didn't include it in my stats
command, sorry about that. Here's the edited search:
sourcetype=ATM OR sourcetype=Zabbix
| eval date=strftime(_time,"%d-%m-%Y")
| rename JNAME AS DEVICE_ID
| fillnull value="No data" AVAILABLE
| fields DEVICE_ID sourcetype date
| search AVAILABLE="No data"
| stats count(Device_ID) as DeviceCount dc(sourcetype) as SourceTypeCount by date DEVICE_ID
| where SourceTypeCount>1
jrodman
Thank you for your comment. But |where DEVICE_ID="*" produce no results. Also I'm sure that numbers provided by my search are correct for some dates, so I think this doesn't affect my case
HiroshiSatoh
Yes, just and error. It doesn't affect my case
I also found that subsearch in join was limited by time. When I tried to change limits (in limits.conf) it starts to produce correct results for more dates. So, I think this is the case and I need to understand how I can speedup search and avoid incorrect results in general.
Any ideas?
Clerical error?
join type=left DEVICE_ID, data
-> join type=left DEVICE_ID, date
I don't know if this is your problem, but please note that count(DEVICE_ID) does not count the number of results which have the field DEVICE_ID. Instead it counts the number of occurrences (or values, if you prefer) of the field DEVICE_ID, which can disagree in the case of multivalue fields. You may wish to check if you get the same number for |where DEVICE_ID="*" | stats count by date