I am executing the following search and it is taking a long time to execute. Is there a way to save the results of parts of a search so that when I modify the tail end I don't have to run the whole search? I.e. can I save the results of user=* | dedup _ raw
and then run those saved results through subsequent searches?
user=* | dedup _raw | transaction user date_minute date_second
To save an intermediate result, you could also use
some search | outputlookup temp.csv
and from here on start a new search with
| inputlookup temp.csv | continue search
If some search
is a complex (time-consuming) search and you just want to play around with different ways of doing it in continue search
, then this method will allow you to do so without any hassle. The only thing you may want to look out for is if the intermediate results are too numerous for a .csv file (say, some hundred thousand lines of result).
Use | outputcsv
to send to disk and then use | inputcsv
to pull back in. You can also use Tableau which has a Splunk connector so you can pull in your raw data and save to disk and then do all of the "stuff" to it from the disk image.
To save an intermediate result, you could also use
some search | outputlookup temp.csv
and from here on start a new search with
| inputlookup temp.csv | continue search
If some search
is a complex (time-consuming) search and you just want to play around with different ways of doing it in continue search
, then this method will allow you to do so without any hassle. The only thing you may want to look out for is if the intermediate results are too numerous for a .csv file (say, some hundred thousand lines of result).
Thanks for this interesting suggestion.
I have tried applying this, but I'm getting strange results. Consecutive identical searched is returning different results. My suspicion is that different parts of the search is performed asynchronously, causing the data in an earlier version of temp.csv being read before the new version of temp.csv is written.
Could this be possible?
Note: I'm using "| inputlookup temp.csv" inside a subsearch. Maybe the subsearch is executed asynchronously with the main search?
UPDATE: after looking at the Splunk documentation on subsearches, I read this: "The subsearch is in square brackets and is run first. " This explains the strange behaviour.
Apply filtering as soon as possible and do not use transaction unless you have to.
Specify your index name and sourcetype because it will speed things up.
Also restrict your search by time using earliest and latest.
If you post the whole query I can try to be more specific:
index=foo sourcetype=bar user=*
| fields user date_minute date_second
| stats list(user) by date_minute, date_second
Let me know if that helps
If I only have one index and one sourcetype, will this speed things up? I want to look at all events, and not just within a time window.
Is there a way to reuse the results of a search?
Even if there's only one index and one sourcetype it's always better to be as specific as possible and apply that filter as early as possible in your query.
You can reuse the results of a search via different ways but it all depends on what you are trying to achieve, if you give us more details we might be able to help.
For instance, you can use subsearches, output and inputcsv, collect, etc.
the dedup _raw takes so long I am hoping to store its result to pipe to subesequent searches. I need to do thsi step because I have many duplicate events for some reason.
But why do you need to dedup the whole RAW event if you are then only using the following three fields: user date_minute date_second?
Doesn't the following query work for you?
index=foo sourcetype=bar user=*
| fields user date_minute date_second
| stats list(user) by date_minute, date_second
Or the alternative that uses values instead of list to remove duplicates:
index=foo sourcetype=bar user=*
| fields user date_minute date_second
| stats values(user) by date_minute, date_second
You'd probably achieve the same result by using just the stats command, which will be much faster. What is the search requirement here?
I am looking to group events by transaction. Will the stats command do this for me?
I have a lot of events. By doing user=*, I narrow it to login events since they have a user field. I end up with duplicate events, and I go through dedup. Finally i am left with events, some of which group together (i.e. password accepted and session opened). This is why I want to group as transactions: want to preserve individual events, but want to know the number of independent transactions.
It would be nice to know if there is a way to re-use the results of previous searches. Is there a way to do this?
What all field you're interested in? all the fields OR just _raw?
As @javiergn mentioned, restrict your base search by specifying index/sourcetype/source etc. To remove duplicates, group events based on user, date_minute, date_second, try this stats option.
index=blah sourcetype=blah user=* | stats latest(user) as user latest(date_minute) as date_minute latest(date_second) as date_second by _raw | stats list(_raw) as _raw by user date_minute date_second
If you want to preserve more fields add the to both the stats in similar way.