Splunk Search

How can I create a scatter plot of data points distributed over time?

mtanadsk
Explorer

Hi,

I am having some difficulty in locating information to help me to create a scatter plot (over time) of a data set that I currently am reporting off of.

Sample log entry:

%<---snip---
2010-04-19 20:10:04,658 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 67ms

2010-04-19 20:10:06,952 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 83ms

2010-04-19 20:10:18,562 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 76ms

2010-04-19 20:10:22,864 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 200ms

2010-04-19 20:10:24,792 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 74ms

2010-04-19 20:10:26,460 ************/****/********/pc/mform_proc INFO InetMailSession.sendMessage() took 80ms

%<---snip---

The data that I'm particularly interested in is the last field, a response time in ms. Right now, I have a timechart plot, with averages, etc... and now would like to include a scatter plot of the distinct values over time, where "Response time" is in milliseconds on the y-axis and Date/Time is on the x-axis.

Make sense?

Any information is greatly appreciated!

thanks, -mt

gkanapathy
Splunk Employee
Splunk Employee

You can use

... | timechart values(x)
0 Karma

Simeon
Splunk Employee
Splunk Employee

You should use the timechart command:

http://www.splunk.com/base/Documentation/latest/SearchReference/Timechart

If your field is called numberfield:

my search query | timechart max(numberfield)

You can use max() or min() as there should be only one value per event sampled.
As for a specific use case, let's assume you have a network device that logs thruput to a field called thruput:

host=network_device | timechart max(thruput)

If you had multiple network devices, you can group by the network device if you search by the sourcetype:

sourcetype=network_log_file | timechart max(thruput) by network_device

sideview
SplunkTrust
SplunkTrust

I recommend if you can, keeping it simple as in the following:

| timechart max(response_time) min(response_time) avg(response_time) 

The reason being that the FlashChart module in the UI has a limit of some number of rows past which it will truncate, and also that the performance of the flash pulling down that much data at all can make for a clunky experience.

The other reason for this, is that we changed some things that made it possible to do scatter charts where time was NOT the x-axis, and in so doing made it quite difficult to actually do the cases where time IS the x-axis. (timechart is your friend).

Go to the "advanced charting" view, and run a search like:

index=_internal source=*metrics.log group=per_sourcetype_thruput series=splunkd
| rename _time as time | fields time eps

over the last 60 minutes.

(the renaming of _time to time is to dodge a bug where 'scatter' charts with time series data are always blank)

-- change 'chart type' to 'scatter'. That will show you an honest-to-god scatter chart where time is the x-axis and eps is the y-axis.

Unfortunately the values on the time axis are now seconds since 1970.

(If on the other hand you ever wanted to do a scatter chart where the x-axis is numeric, this actually works quite well.)

index=_internal source=*metrics.log group=per_sourcetype_thruput series=splunkd 
| fields kbps eps

nonaronald
Explorer

I think that the "advanced charting view" sideview is mentioning is deprecated? Please correct me if I'm wrong.

0 Karma

manus
Communicator

| eval time=_time | table time latency

And you need to select scatter in graph options.

That looks like a real time scatter, except that the times are written in epoch time.

0 Karma

dmcguerty
Explorer

Interesting.
Assume you want to track events that are supposed to run every minute of the day. You could sum them by hour (by event type) and should get about 60 events/hr per event type. Then you could represent each hour as a colored scatter point as either green (58-66), yellow (50-58), red (<50) or purple (>66).

If so, you could monitor 20 different events, setting a specific 'Y-axis' value for each different type so they appear horizontally in parallel. Using a scatter time basis that goes back to 1970 isn't very realistic. With scatter, can you specify earliest as '-24h' or 'today?

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

This will do it:

sourcetype=mydata | timechart bins=4000 list(response_time) as response_time | mvexpand response_time

Using bins=4000 will collapse your time range on the x-axis into up to 4000 equal discrete intervals, so note that this may move your x-axis timestamps. For most charting purposes, though 4000 bins will look right and will be close enough. You can use up to 50,000 bins if needed.

manus
Communicator

This is not really a time scatter. The abscissa is not time. The abscissa is time chronology.

So for these values:

00:01

00:02

00:10

You would have abyssa 1, 2, 3.

The graph doesn't highlight that 00:02 is closer to 00:01 than it is from 00:10.

0 Karma

sideview
SplunkTrust
SplunkTrust

NOTE: if you set bins this high, and you're in 'line' chart, make sure that either 'x-axis' > 'display markers' is 'yes', OR that 'Null Values' is set to 'zero' or 'connect'. Otherwise your chart will often look empty and you'll be confused.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...