I have 3 sourcetypes representing learners, courses and course completion details
learners - the learner details
catalog - the total course catalog for each learner
completions - the course completions for the learer
Amongst other fields,
"learners" contains Username, Name
"catalog" contains Username, Asset_Title
"completions" contains Username, Asset_Title
I want to produce a report showing each learner, the total number of courses in their catalog and the total number of courses they have completed along with a % completion. The following search almost gets there.
| set intersect [ search sourcetype=catalog* | fields Asset_Title, Username | fields - _* ] [ search sourcetype=completions* | fields Asset_Title, Username | fields - _* ] | eventstats dc(Asset_Title) as Count by Username | join Username [ search sourcetype=learners* ] | stats Count by Name
but I can't see how to get the total courses from the catalog for each learner to be retained as a field after the set operation, so I can add that as a Total field and calculate the %.
Why not this? It seems much more simple
Username=*
| stats count(eval(match(sourcetype,"catalog.*"))) as Catalog count(eval(match(sourcetype,"completion.*"))) as Completed first(Name) as Name by Username
Although it looks like you expect to see duplicate Asset_Titles, so maybe this would work better
sourcetype=learners
| append [ search sourcetype=catalog | stats dc(Asset_Title) as CatalogCount by Username ]
| append [ search sourcetype=completions | stats dc(Asset_Title) as CompletionCount by Username ]
| stats sum(CatalogCount) as Total sum(CompletionCount) as Completed first(Name) as Name by Username
| eval PercentCompleted=round(Completed * 100 / Total, 2)
| table Name Username Completed Total PercentCompleted
| sort Name
I think either of these will run faster and be less complex than all the joins...
Why not this? It seems much more simple
Username=*
| stats count(eval(match(sourcetype,"catalog.*"))) as Catalog count(eval(match(sourcetype,"completion.*"))) as Completed first(Name) as Name by Username
Although it looks like you expect to see duplicate Asset_Titles, so maybe this would work better
sourcetype=learners
| append [ search sourcetype=catalog | stats dc(Asset_Title) as CatalogCount by Username ]
| append [ search sourcetype=completions | stats dc(Asset_Title) as CompletionCount by Username ]
| stats sum(CatalogCount) as Total sum(CompletionCount) as Completed first(Name) as Name by Username
| eval PercentCompleted=round(Completed * 100 / Total, 2)
| table Name Username Completed Total PercentCompleted
| sort Name
I think either of these will run faster and be less complex than all the joins...
Thanks, the 'big table' concept got it for me. I had tried the separate searches, but didn't understand the blank entries in the results after the appends, but when putting that into a table, I worked it out.
You've been a great help, you've got me over a mental hurdle. Thanks!
Hopefully that helps. Or just run the first line of the search. Then run the first two lines of the search and see what happens. Then run the first 3 lines of the search... etc. etc.
Try running each of these searches separately:
sourcetype=catalog | stats dc(Asset_Title) as CatalogCount by Username
sourcetype=completions | stats dc(Asset_Title) as CompletionCount by Username
sourcetype=learners | table Username Name
Then think about how you would combine these into one big table - maintaining a row for each event (some columns would be empty).
Now, with your table, imagine doing the following stats
command:
stats sum(CatalogCount) as Total sum(CompletionCount) as Completed first(Name) as Name by Username
which creates a single row for each Username
Hi Iguinn, one thing I don't understand about Splunk is how the append works to append the CatalogCount and CompletionCount values to the right learner result. How is it that the CatalogCount by Username will get correctly associated with the relevant Username record from the sourcetype=learners.
I'm still missing something fundamental in how the search pipeline works I think. Can you clarify how this works.
Thanks!
Bah - I just figured out another problem with my first search...
Sorry about the typo in the append
command - I corrected my answer. Thanks!
Thanks iguinn! The first didn't work - I actually have a couple of variants of sourcetype, i.e. I have completions_standard and completions_script, so I normally use completions* but the stats command doesn't like count(sourcetype=completions*).
However, the second seems to work - although it needed the 'search' in the append subsearch, i.e.
append [ search sourcetype...