Splunk Search

How to search transactions across different hosts with with the same uuid?

kritho
Explorer

Hi all,

I have a few sources that report a GUID/UUID across different hosts. (basically load balancers, intermediate proxies, auth, and different app servers all writing uuid in logs)'
I'm interested in help with displaying 3 things;

  1. Length of entire transaction over hosts - one host has several steps before passing on, processing the same requests.
  2. time between sources (input to be charted)
  3. missing sources (lets say i know for sure it's supposed to be 4 different servers with the GUID, and one of the servers has several steps)

An example dataset would be (notice same uuid), but different hosts.

2016.1.1 12:00:00.0125 host=10.1.1.30 host=www.mydomain.com src_ip=99.226.12.39 method=GET uri="/my/cool/application" uuid=28745996-dda7-eaba-8148-1615b51314a3
2016.1.1 12:00:01.0125 server=10.2.1.30  src_ip=10.1.1.30  uri="/my/cool/application" uuid=28745996-dda7-eaba-8148-1615b51314a3
2016.1.1 12:00:02.0125 hostname=10.2.1.30 step=incoming srcip=10.1.1.30 uuid=28745996-dda7-eaba-8148-1615b51314a3 
2016.1.1 12:00:04.0125 hostname=10.2.1.30 step=ok srcip=10.1.1.30 uuid=28745996-dda7-eaba-8148-1615b51314a3 
2016.1.1 12:00:07.0125 hostname=10.2.1.10 srcip=10.2.1.30  uuid=28745996-dda7-eaba-8148-1615b51314a3 status=completed

Any takers would be appreciated
brgds
kristen

0 Karma
1 Solution

lguinn2
Legend

Updated answer

If you just want an average then

 yoursearchhere
 | stats range(_time) as total_time by uuid
 | stats  avg(total_time) as "Average Transaction Time"

Average time between sources/steps for each uuid:

 yoursearchhere
 | streamstats range(_time) as total_time earliest(step) as completed_step current=t window=2 global=f by uuid
 | stats avg(total_time) as average_time by completed_step

For the potentially "missing steps", try this

 yoursearchhere
 | stats count by uuid
 | eventstats  avg(total_time) as avg
 | where count < avg

Even fancier, show all the steps for UUIDs that may have missing steps

yoursearchhere [ search yoursearchhereagain
     | stats count by uuid
     | eventstats  avg(total_time) as avg
     | where count < avg | fields uuid ]
| sort uuid _time

The last search will not work if there are thousands of UUIDs with missing steps.

View solution in original post

lguinn2
Legend

Updated answer

If you just want an average then

 yoursearchhere
 | stats range(_time) as total_time by uuid
 | stats  avg(total_time) as "Average Transaction Time"

Average time between sources/steps for each uuid:

 yoursearchhere
 | streamstats range(_time) as total_time earliest(step) as completed_step current=t window=2 global=f by uuid
 | stats avg(total_time) as average_time by completed_step

For the potentially "missing steps", try this

 yoursearchhere
 | stats count by uuid
 | eventstats  avg(total_time) as avg
 | where count < avg

Even fancier, show all the steps for UUIDs that may have missing steps

yoursearchhere [ search yoursearchhereagain
     | stats count by uuid
     | eventstats  avg(total_time) as avg
     | where count < avg | fields uuid ]
| sort uuid _time

The last search will not work if there are thousands of UUIDs with missing steps.

kritho
Explorer

Excellent. This works! tx

0 Karma

lguinn2
Legend

This search will list the transaction time, in seconds, for each uuid

yoursearchhere
| stats range(_time) as total_time by uuid

Time between sources/steps for each uuid:

yoursearchhere
| streamstats range(_time) as total_time earliest(step) as completed_step current=t window=2 global=f by uuid

I don't know what you mean by "charted." You can make a chart that lists all the uuids along with the time, but that doesn't seem terribly useful. Exactly what do you want to see on the x-axis and the y-axis of your chart?

Finally, it is hard to search for something "missing" - how would Splunk know something was missing unless it understood the overall pattern?
It would be helpful to have details about how you would figure out the missing steps manually. Then perhaps the community can help you find a way to identify the missing setups.

kritho
Explorer

Thanks Iguinn,
Maybe I wasn't clear enough,
The loadbalancer stamps each incoming request with uuid, passing it along to the next service. So it would be nice to show the average transaction time for the entire service-path (not particulary by any specific uuid), as well as the time between each service as you suggested - it works
Regarding "missing" parts, for easyness, just the number of events in each transaction would be nice, any deviation would be even nicer (I know for sure any service-path should have atleast "hits" with the same uuid statement)
X-axis could be time, an y-axis could be a barchart split by duration of each transaction (ie time between events/host)

brgds
kristen

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...