Splunk Search

How to group events to recreate the detail of a session

mikeydee77
Path Finder

I have some events representiong a customer’s interaction with one of my company’s applications. The typical flow is that the customer: starts a session, browses offers, selects a product and hopefully activates it. This is shown in the attached diagram.

alt text

Notes:
The events are in the same index and sourcetype. Each event is similar in format but varies as the application writes fields (including the session ID) in a pipe delimited manner.
Session has a max time of 5 minutes.

What I would like to do is use splunk to recreate the session as best it can by grouping events based on the session ID over multiple events. Then I can work out things like:
• how many customers dropped out of a session for some reason, or
• calculate the amount of time a customer spent in the session.
• Which shops generate the most activations
• etc

I am relatively new to Splunk (completed training and confident with search, stats and timecharts, eval etc). So just looking to see if a) this is possible and b) where I should be looking in terms of commands to achieve this.

I took a look at transaction but I don’t think that is that sophisticated so started looking at sub searches and got confused by the syntax.

0 Karma

nickhills
Ultra Champion

I would have suggested |transaction sessionid as a starting point - its a very powerful command which sounds quite suited to your needs.
What sort of things were you struggling to get from it?

Are you able to share any logs?

If my comment helps, please give it a thumbs up!
0 Karma

mikeydee77
Path Finder

Hi, I will have to obfuscate some logs for sharing as they contains sensitive info but it is possible... My experience is that transaction is simple enough to use and should work if each event exposes a common field like session ID. However in my case each event is not exposing the session ID currently so I have to either work on the field extractions (which is complex since the field formats vary an awful lot e.g. depending on result code) and get the common field or somehow handle this within the query. I will try it out and follow up

0 Karma

mikeydee77
Path Finder

Couldn't edit question... So here is correct diagram.

alt text

0 Karma

elliotproebstel
Champion

I know you said above that you were hesitant to pursue full field extraction for these events, but I think it's your best (and most scalable) option in the long term. Putting in the time now to build those extractions will allow you to quickly navigate the logs now and in the future. Lots of folks here will be happy to help you build field extractions if you post some sample events (preferably with as little obfuscation as necessary to mask sensitive data).

If you are able to get the field extractions done - and if the values of x, y, and z are globally unique, then this should work for getting the value of the field "session ID" applied across all events in the diagram above:

your base search 
| eventstats max('session ID') AS "session ID" BY SHOPID
| eventstats max('session ID') AS "session ID" BY transactionID

After that, you can more easily do analysis on the grouped events.

0 Karma

mikeydee77
Path Finder

Hi Elliot,
You have convinced me...

I did ask for suggestions on approaches and I have two to consider and both suggest that I need to roll my sleeves up. To be honest, we do have the skills to do the Field extractions so I will pursue that although we have a lot going on so I was looking for a short cut or just to see if it was in fact possible which from what I am hearing it certainly is. And I have used transaction successfully on some other logs just fine but that was very simple.

Unfortunately it will be difficult to share a log file as they contain so many details that I should not share. But to give you a flavor of the tedium that awaits me.. The general event format is like this simple one below, and the bold bit is being extracted as fields...

16.01.2018 15:21:20 G - [105100] 13_0_0 fetchOffers request : 59722336|46239067|phone_number|S

bold bit means: date, [transaction correlation] version service_name request/response

... however the pipe delimited bit that follows is not extracted and that is because it is not always consistent even within a service name. It is because the application is just logging a database record which and writing it to the log file. However some of the DB entries logged may contain repeating records and that just spews out loads pipe delimited fields.

However I think that the session ID is in a consistent position for each request type (where it appears) so it will not be that difficult to do; just time consuming.

By the way would like to change the application's behaviour, really we would.

0 Karma

elliotproebstel
Champion

Hah, yes of course! I've definitely wanted to change the logging behavior of many applications, I assure you.

In a bind, I've used a series of rex commands in a row to extract desired fields - at least for testing as I wait for other folks to implement more permanent/globally-accessible field extractions. Good luck on the extractions, and don't hesitate to post them as a new question (making it easier for folks with regex experience to find) if you want some support.

0 Karma

mikeydee77
Path Finder

Yep - rex can be your friend. Many thanks for taking time to respond today. Much appreciated.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...