Splunk Search

Update Summary Index Events

sanjay_shrestha
Contributor

There are 2 data sources A & B with common field common_field.


Source A
Common_Field  A1-Field     A2-Field 
   C1             A1          A2

Source B

Common_Field     B1-Field     B2-Field  
   C1              B1          B2 

In the resulting summary index, I would like to have data as:

 

Common_Field A1-Field A2-Field B1-Field B2-Field

C1 A1 A2 B1 B2

I created saved search as follows :


SourceType=”A” OR SourceType=”B”| stats values (A1),values(A2), values(B1),values(B2) by common_field.

It is scheduled to run once every 5 min.
if there are correlated events at the time of search execution, it works fine.

 

Common_Field A1-Field A2-Field B1-Field B2-Field

C1 A1 A2 B1 B2

However, if there are only events from source A at the time of execution, then we get

 

Common_Field A1-Field A2-Field B1-Field B2-Field

C1 A1 A2

This is OK until subsequent execution of saved search.

After few minutes (in another execution of search), data from source B gets populated, then this row did not get updated. Those fields are still blank.

 

Common_Field A1-Field A2-Field B1-Field B2-Field

C1 A1 A2 Need to update Need to update

Is there a way to update events?

Thanks,
Sanjay

0 Karma

somesoni2
SplunkTrust
SplunkTrust

You may try the following provided
1. The common field has unique value for each transaction (between source A and source B).
2. All transactions are getting completed with duration less then 5 min (considering you're doing your Summary Index Search for 5 min)

|set union [search index=myindex sourcetype=sourcea | stats values(A1_Field) as A1_Field, values(A2_Field) as A2_Field by Common_Field | join max=0 Common_Field [search index=myindex sourcetype=sourceb earliest=-10m@m | stats values(B1_Field) as B1_Field, values(B2_Field) as B2_Field by Common_Field] ] [search index=myindex sourcetype=sourceb | stats values(B1_Field) as B1_Field, values(B2_Field) as B2_Field by Common_Field | join max=0 Common_Field [search index=myindex sourcetype=sourcea earliest=-10m@m | stats values(A1_Field) as A1_Field, values(A2_Field) as A2_Field by Common_Field]]

What I am going here is taking data from source A for time say 10:05 AM to 10:10 AM and joining those events with events from 10:00 AM to 10:10 AM for source B. This will give all completed transactions within 10:05 to 10:10 plus any transaction which started in 10:00 to 10:05 and got completed during 10:05 to 10:10. Similar thing is done other way around. Finally a Union operation is done which will give list of transactions which got initiated and completed during this 5 min window as well as which got initiated in previous 5 min window and got completed in this one.

0 Karma

fk319
Builder

Another option you may consider is

SourceType=”A” | stats values(A1) as A1, values(A2) as A2 by common_field
SourceType=”B” | stats values(B1) as B1, values(B2) as B2 by common_field.

and build the table later

transaction common_field | table common_field, A1, A2, B1, B2

My question is it quicker to use a summary indexes than the raw logs? Another option is to use two summary indexes, one is fields A's and the second is fields A and B.

I hope something here helps...

0 Karma

msarro
Builder

No, there is not. I had a similar mental stumbling block when I first started using splunk - don't think of events as a line in a database that can be updated; they are literally the event as it occurred in time. The same thing goes with data in a summary index. It becomes a concrete record of the results at a given point in time, it can't be easily modified.

For what you are trying to do, it may be helpful to try and either do the correlation at search time, or if the dataset is too large for that to be feasible, if you MUST update records after correlation, look at using the DB connector to write the records out to a database instead of a summary index, and then update them that way. You may have to be a little creative.

0 Karma

msarro
Builder

Some voodoo can be done using carefully crafted searches, creating a field indicating whether or not the correlation is complete, and storing incomplete transactions in a CSV as well. If you have a DB background, you really may find using a DB server in tandem with splunk to be the easiest approach.

0 Karma

sowings
Splunk Employee
Splunk Employee

Deleting data doesn't really delete data, so, no, this is probably not the best approach.

I've had colleagues who dealt with this by being clever about their search: don't write the summary event if the transaction is still open, i.e. "| where isnull(B1) OR isnull(B2)". You can then run (slightly) overlapping searches, to ensure that you get all of the results.

0 Karma

sanjay_shrestha
Contributor

Would it be better practice, if data is deleted every time before running the saved search and run the search with out time range?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...