Hi
I have many configuration text file which basically looks like this:
Owner Name: AAAAA AAAAA
Product Name: AAAA AAAA
Product ID: NNNNN-NN Serial ID: NN-NN-NN-NNNNN
Sometimes there is change in the product ID or serial ID and I want to index the new change but I don't want to keep old event. Basically, I want to replace the old configuration file with the new one.
I tried the below inputs.conf because some files where are not getting index because of the similarity between them. Every thing was fine until I found out that every time there is change in the configuration text file, the file is index but it doesn't replace the old one. So now I have multiple configuration files with the same source which is a problem.
[Monitor://Some directory]
index = my_index
sourcetype = my_sourcetype
crcSalt = <SOURCE>
(1) Right now I need to delete all events that have already a new version of them based on the _indextime.
(2) I need a new inputs.conf setup that will prevent this behavior.
This solution comes basically from wookcook's idea.
I eliminate the
| sort - indextime
because it wasn't running correctly for me.... | eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | search index=* NOT [...| eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | dedup source | fields field1, field2 ,indextime]
Have you tried not deleting old results but instead just searching for the latest results? Something like:
... | eval _time=(_indextime) | stats latest(*) by source
This solution comes basically from wookcook's idea.
I eliminate the
| sort - indextime
because it wasn't running correctly for me.... | eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | search index=* NOT [...| eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | dedup source | fields field1, field2 ,indextime]
OK, try this then:
... NOT [... | sort - _indextime | dedup source | fields _raw]
If this looks correct (has only the bad stuff), then it should be safe to pipe this to | delete
.
It is very interesting your idea, but it didn't work and I am not sure why.
The right side search provides all the good data, so basically the Boolean operator NOT should eliminate the good data from the total data leaving only the bad data.
What is the result of this search?
... | sort - _indextime | dedup source | fields _raw | format
It should have 1 field called search
that has a list of OR
on _raw
.
This search provided all the right data. If I look in Statistics I have a table with one row:
_raw search
NOT ()
That's all I've got. Play around and see if you can make it work and update this Q&A with what you find.
The first part of the query can be as simple as index=foo sourcetype=my_sourcetype
.
Try this instead.
index=foo sourcetype=my_sourcetype | eval oldest=relative_time(now(),"-1d@d") | where _indextime<oldest
Adjust the arguments to relative_time as needed.
I don't see how this solve my problem. Could you elaborate or explain more your solution?
now()= is the time when the search started
oldest = is 1 day before the search request started
Basically I need a solution that provides the same results than woodcock's solution but without using eventstat.
This should be the opposite of dedup:
... | eventstats max(_indextime) AS latestIndexTime by source | where _indextime<latestIndexTime
Then you just pipe that to delete
by adding this:
... | delete
Your command is perfect for selecting the events, but I encountered the following error when added the delete command.
Error in 'delete' command: This command cannot be invoked after the non-streaming command 'eventstats'.
The search job has failed due to an error. You may be able view the job in the Job Inspector.
I am going to retry to run the command, but for some reason it takes so much time to run.
I got the same error. My roles are can_delete, user, power.
My solution will not work then, because evidently the use of eventstats
precludes the use of delete
(which, IMHO, is definitely a bug).
Splunk does not have a "index this only if it's not already indexed" feature. The performance of such feature probably would be poor. Nor will it replace or update anything already indexed.
You can remove duplicate data (or any data) by piping a search to the delete
command.
Ok. That's too bad. But how to make Splunk delete events that has a new version of them? I know about the delete command, but I haven't been successful to select the appropriate data. With the below command I've been able to see the indextime and which files have more than two files for the same source.
... | eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | stats count by source| where count>1
The only way I have found is deleting file at a time which is very inefficient.
You need to have permission to use the delete
command. That's the best way to remove events from Splunk.
I have permission to use the delete command, the problem is that I don't know how to select the appropriate data for deleting.
See my Answer!