Knowledge Management

Can a search that populates a summary index add a field to the raw data?

vpao
Engager

Hello,
I am populating a summary index with a search:
index=index1

| addinfo | collect index=summary

I want to schedule the above search to run multiple times a day, but due to the nature of the data, this will introduce duplicate events into the summary index. Is there a way for the populating search to add a field to index1, called isProcessed="true", so that the populating search can filter events by isnull(isProcessed) and duplicate events won't be added to the summary index?

0 Karma
1 Solution

somesoni2
Revered Legend

Data once indexed can't be changed, so the answer is no. What you can do is to modify your summary index search so that it'll exclude events from index1 which are already available in sumary.
e.g. If you've a primary key unique field in the index=index1 events, your search will be like this

index=index1 NOT [search index=summary | stats count by primaryKeyField ] | addinfo | collect index=summary

Also, I would do more analysis on why there are duplicates. Do you've overlapping time range in your summary index search?

View solution in original post

somesoni2
Revered Legend

Data once indexed can't be changed, so the answer is no. What you can do is to modify your summary index search so that it'll exclude events from index1 which are already available in sumary.
e.g. If you've a primary key unique field in the index=index1 events, your search will be like this

index=index1 NOT [search index=summary | stats count by primaryKeyField ] | addinfo | collect index=summary

Also, I would do more analysis on why there are duplicates. Do you've overlapping time range in your summary index search?

vpao
Engager

@somesoni2
That's awesome. I wasn't successful in excluding events with the stats command. I changed to the table command and verified the search works. Thanks!
index=index1 NOT [search index=summary | table primaryKeyField ] | addinfo | collect index=summary

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...