Splunk Search

Can I add a field to an event based on data in a separate event?

aneaston
New Member

I have one sourcetype that contains an event for each request to my site. One of the fields (we'll call it 'api') in the event is (more or less) what API that request hit. The site also has one specific API that gets hit (in addition to the original API) for any request that is not a bot. Events to this API also have a field containing the original_request_id, though they have their own request_id as well.

Basically what I'm trying to do is add a field to all the events that contains whether or not an event to the IS_NOT_BOT API with their request_id in the original_request_id field also exists in the log.

Because they do share request_id (albeit in different fields) I could use a transaction potentially. The dataset I'm working with is quite large and can already be quite slow, so I'd rather not add transactions unless I have to. Anyone have any ideas?

0 Karma

aneaston
New Member

I came up with this solution, but it's super slow. Anyone know how to make it faster?

sourcetype=my_log
| eval related_request_id = if(isnotnull(original_request_id), original_request_id, request_id)
| eventstats count as request_id_count by related_request_id
| eval not_bot_validated=if(request_id_count > 1, "true", "false")
| fields - related_request_id, request_id_count
0 Karma

lguinn2
Legend

Add index=therightindex if you have multiple indexes in your environment (you should) to your search. This should speed it up. Right now, Splunk has to examine every index to determine if it has the sourcetype you named

0 Karma

lguinn2
Legend

You can't add the "additional field" at index time, but it is easy enough to do at search time.

yoursearchhere
| eventstats count as num_requests first(special_field) as special_field_found  by request_id 

Since you didn't give many concrete details, I can't make very concrete suggestions.
Above, I used eventstats to calculate a few new fields, which will be add to all events. A count can often be useful, as you can use it to filter out request_ids that don't have enough events to qualify. You can also capture a field value using functions such as first, last, earliest, latest or values. In the example above, I capture the first value of some field and added it to all the events.

Hope this gives you some ideas to get started. I agree that you should avoid the transaction command if possible. The eventstats command can also be slow over large datasets. You might be able to avoid it - but the community would need to know more about the output you want to see from your search.

0 Karma

aneaston
New Member

I'm digging into eventstats now (haven't used it before) and will let you know how it goes. In the meantime, I'm adding detail to the original question as well. Thanks for your response!

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...