Hello,
Lets say we have time period from Jan 2011 till Jan 2012. Is it possible to build a timechart that count events for each time span (e.g 1month) from the very beginning (Jan 2011)? Normaly timechart counts evants by a span. Also simple sum of previous months results doesn't cover the specific issues of dynamics needed.
At the moment I built up rather bulky search that uses append
function for each span and different endtime
in each appended search. I'm wondering is there more elegant decision of this issue?
This is a letter to myself in the past:
Dear Kate!
You should use Summary Indexing together with backfill mechanism! Here is a beginning of your road for convenient and fast work with historical data:
http://docs.splunk.com/Documentation/Splunk/4.3.3/Knowledge/Usesummaryindexing
http://docs.splunk.com/Documentation/Splunk/4.3.3/Knowledge/Managesummaryindexgapsandoverlaps
What a pity that you'll stumble upon this information that late. But still it's so good you'll got to know it.
This is a letter to myself in the past:
Dear Kate!
You should use Summary Indexing together with backfill mechanism! Here is a beginning of your road for convenient and fast work with historical data:
http://docs.splunk.com/Documentation/Splunk/4.3.3/Knowledge/Usesummaryindexing
http://docs.splunk.com/Documentation/Splunk/4.3.3/Knowledge/Managesummaryindexgapsandoverlaps
What a pity that you'll stumble upon this information that late. But still it's so good you'll got to know it.
If this is too simple, please let us know more what you want to achieve.
sourcetype=your_sourcetype earliest=-1y@y latest=@y| timechart span=1mon count
UPDATE:
So what you want is a timechart with a one month granularity, over userIDs who registered but never bought anything (i.e. only occur once in your logs)? Assuming that there are 12 new inactive users registered per month, it would be something like ;
Month Accumulated_Onetime_users
Jan 2011 12
Feb 2011 24
Mar 2011 36
sourcetype=your_sourcetype earliest=-1y@y latest=@y
| dedup 2 userID
| transaction userID maxevents=2
| search eventcount=1
| bucket _time span=1mon
| stats count AS Onetime_users by _time
| accum Onetime_users AS Accumulated_onetime_users
| eval Month = strftime(_time, "%b %Y")
| table Month Accumulated_onetime_users
I'm thinking this should work 🙂 , but I'm doing it all in my head, since I don't have any relevant logs to play with. There probably more efficient ways of doing it, even if it actually does work.
UPDATE:
Ok now I have tested the search with some real data. The first three lines of the search above (after the sourcetype and time constraints) will find those events that only have a single unique event for the field value.
dedup 2
limits the initial set to two of each, since there is no point in knowing more than that.
transaction userID maxevents=2
groups events together based on the userID. Thus a transaction can have 1 or 2 events.
search eventcount=1
will keep the transactions that only have one event, i.e. the ones we want to find.
Then we treat all events from the same month as having the same timestamp. (bucket _time span=1mon
), so we can make counts of events by month. (stats count
)
accum
keeps a running total of the events.
the eval Month
is for presentation in a nicer way, and the table
is just showing the results.
You could add the Onetime_users
to the table if you wish, to show which were new for each month.
Let me know if this gives you anything meaningful.
Hope this helps,
Kristian
Kristian, some time ago finally I found the the core solving of the question that I felt should surely exist. The answer is below)
Sorry if I misunderstand what you want to achieve - but the search I provided does work the way I said. See update above.
/k
At the moment I explicitly state the time for each month that is not so well.
source="A" OR source="B" starttime=01/01/2011:00:00:00 latest=now | stats count as activity by user_id | where activity = 1 | eval t=case(activity = 1, "this month") | chart count by t |
append [ search source="A" OR source="B" starttime=01/01/2011:00:00:00 endtime=02/01/2011:00:00:00 | stats count as activity by user_id | where activity = 1 | eval t=case((activity = 1) AND (1296518400 < time()), "2011.01") | chart count by t ] |
append [...]
...
chart max(count) by t
10 digits: converted to epochtime 'endtime':)
Kristian, this is a beautiful search! Thanks for it a lot, it also helped me to learn several new search functions:) I was sure it should work the way I need, but it shows just two or one first months while it can't be so. Moreover span=1mon option doesn't let us see filtered userIDs for each month from the needed time range: [earliest -
Kristian, thanks for the reply! But your suggestion can not work for this issue as building of my timechart is preceeded with counting events by user id(for all the time) and selecting only entries with one occurance.
(In my case these are users that only registered and have purchased nothing)
So timechart that you suggest will take one month statistics and count events by users in this month then for another and so on till the 'earliest' time. And it is needed to count each month in the range [earliest -