I am trying to find outliers on a graph by using the median absolute deviation on a graph. I know that the machine learning toolkit for splunk can do this but I'm not using that right now. Essentially, we have the count of ip addresses over time because we are tracking how many times this ip address generates an event. So we have this count value plotted over time. I need to calculate the median absolute deviation and for some reason it doesn't seem to be able to get the median of this. It seems to grab one of the bigger count values as the median even if this value only shows once. There are clearly other values which should be selected as the median yet they were not. (The first column in the table is the count values. There is more than shown but it is clear that 1209 is an outlier, and should not be the median)
index=data sourcetype=json seen.indicator=X.X.X.X
| bin _time span=30m
| eventstats count(seen.indicator) as "Count" by _time
| eventstats values(Count) as valu
| eventstats count(valu) as help by _time
| eventstats median(Count) as med
| eval newValue = abs(Count-med)
| eventstats median(newValue) as medianAbsDev by seen.indicator
| eval upper = med+(medianAbsDev*1.2)
| eval lower = med-(medianAbsDev*1.2)
| eval isOutlier=if(Count < lower OR Count > upper, 1, 0)
| timechart span=30m count(seen.indicator) as CountOfIndicator, eval(values(upper)) as upperl, eval(values(lower)) as lowerl, eval(values(isOutlier)) as Outliers by seen.indicator usenull=f useother=f
| filldown
splunk-enterprisecalculateoutliermedianFri, 21 Jun 2019 13:14:02 GMTcxr5971Average of a field
I have a log trace like, ...........................wages: 50
I have written a splunk query to skip all the entries before "wages:" and print only the values like 50, 30, whatever.
sourcetype=mysource host=myhost* "myClassName" | rex field=_raw "(?<"ac">(?<=wages:).*?$)" | stats count by ac
Now, I'm not able to find the median/ average of the values in ac.
Eg: (50+50)/2
Can you please help me in obtaining this value.
averagemedianThu, 11 Oct 2018 15:23:04 GMTsaranyaa21How do I calculate median values for the column for 7 weeks?
There are two columns with data: `time` (time scale in steps of 10 minutes) and `val` (amount of transactions).
I need to calculate median values (med_val) for the `val` column for 7 weeks. The specific example for the point 12.04.2018 15:00:00 med_val = `median` ( `val` by 7 points. 5.04.2018 15:00:00, 29/03/2018 15:00:00, 22/03/2018 15:00:00, 15/03/2018 15:00:00, 8/8/2018 15:00:00, 03/03/2018 15:00:00, 22/02/2018 15:00:00), i.e. so median at 7 same time points on the same days of the week. If there are no data, then we consider that 0 transactions were performed.
The best that I could come up with is:
| timechart span=10m median(val) | timewrap 1w series=exact
Are there any good solutions?
splunk-enterprisecalculationsmedianMon, 27 Aug 2018 08:44:17 GMTbeltscharts: How can I calculate median for each type on the hourly aggregation?
There are three columns with data: time (time scale in steps of 10 minutes) , val (amount of transactions) and type (type of automated system - 3 different types only).
I need to aggregate data for each type at the hour level - and calculate median(val) for each type on the hourly aggregation. As the answer should be 3 time series of the same length.
What I did:
source="data.txt" | chart median(val) by type, date_hour
But X-axis contains not all hours, they aggregate into "OTHER" tab.
splunk-enterprisechartsmedianSun, 26 Aug 2018 20:45:00 GMTbeltsHow to to take median of number of users logged in past 1 hour
index=XXX sourcetype="XXX-log" opName="LoginUser" earliest=-60m latest=now() | bucket _time span=10m | timechart count
splunk-enterprisemedianMon, 23 Apr 2018 15:48:50 GMTRocky31Stats function by multiple fields
I have a table of data like this
Time1 Time2 Time3 Total
36.650000 16.050000 0.133333 74
44.866667 40.016667 0.366667 107.366667
54.966667 17.483333 0.366667 90.716667
2.083333 57.950000 22.483333 98.550000
41.733333 14.150000 0.150000 80.116667
3.283333 28.083333 0.400000 54.516667
44.783333 27.733333 0.466667 88.933333
There are 4 times produced for each event. I want to do a stats median, p25, p75 by each of these to result in a table like
Process Median p25 p75
Time1 # # #
Time2 # # #
Time3 # # #
Total # # #
statstabletransposemedianMon, 11 Sep 2017 21:04:02 GMTbyu168168How to get _time at median value?
(index=geniachip AND (geniaComplete.flag OR "DVT ready" OR "transfer complete for all banks" OR "lz4.complete*" OR "On-station compression complete.") OR (index=fbu_sizes)
| eventstats values(fbuLZ4Size) AS fbuLZ4Size by run_name
| eval run_id_date_size = run_id."##".date."@@".fbuLZ4Size
| eval message=if(message LIKE "geniaComplete.flag%", "geniaComplete.flag", message)
| eval message=if(message LIKE "lz4.complete%", "lz4.complete", message)
| eval start_time=case(message="geniaComplete.flag", timestamp,
message="lz4.complete", timestamp,
message="transfer complete for all banks.", unix_time,
message="DVT ready", unix_time,
message="On-station compression complete.", unix_time)
| chart values(start_time) over run_id_date_size by message
| eval fbuLZ4Size=mvindex(split(run_id_date_size,"@@"),1)
| eval fbuLZ4SizeGB = fbuLZ4Size/1000000
| search geniaComplete.flag = *
| eval "Xfer Time" = ('transfer complete for all banks.' - 'On-station compression complete.')/60
| eval "ACAP Time" = ('DVT ready' - 'transfer complete for all banks.')/60
| eval "ubf_compress" = ('lz4.complete' - 'geniaComplete.flag')/60
| eval "Total Time" = ('DVT ready' - 'geniaComplete.flag')/60
| eval count=if('ubf_compress' > 30, 1, 0)
| eval count1=if('Xfer Time' > 60, 1, 0)
| eval count2=if('ACAP Time' > 60, 1, 0)
| eval count3=if('Total Time' > 150, 1, 0)
| stats dc(run_id_date_size) AS "Total Runs", sum(count) AS "ubfcompress", sum(count1) AS "Xfer Time", sum(count2) AS "ACAP Time", sum(count3) AS "Total", median("Xfer Time") AS med_xfer_time, median("ACAP Time") AS med_ACAP_time, median("ubf_compress") AS med_ubf_compress, median("Total Time") AS med_tot_time
| eval "UBFcompress lag" = ('ubfcompress'/'Total Runs') * 100
| eval "Transfer Time" = ('Xfer Time'/'Total Runs') * 100
| eval "ACAP Processing" = ('ACAP Time'/'Total Runs') * 100
| eval "Total TAT" = ('Total'/'Total Runs') * 100
| fields - "Total Runs" "ubfcompress" "ACAP Time" "Xfer Time" "Total" "Xfer Rate"
| transpose
| eval Threshold = case(column=="UBFcompress lag", "30 Minutes",
column=="Total TAT", "150 Minutes",
column=="Transfer Time", "60 Minutes",
column=="ACAP Processing", "60 Minutes")
| eval sort_field = case(column=="UBFcompress lag", 2,
column=="Total TAT", 1,
column=="Transfer Time", 3,
column=="ACAP Processing", 5)
| sort sort_field
| fields - sort_field
| rename column AS "Turnaround Time Process"
| eval "row 1" = round('row 1', 1)
| rename "row 1" AS "Percent Runs over Threshold"
| table "Turnaround Time Process" Threshold "Percent Runs over Threshold" Median
This search pulls timestamps for checkpoints in our pipeline. I utilize these checkpoints to determine the length of time the process takes. I then need to compare the time for each individual "run" to a threshold in order to get the percentage of runs that took longer than that threshold on that specific process. I was able to do all that, however, I got a separate request to also display the median for each process which complicated things with the use of the transpose command.
Currently the end result from the above query looks like this
Turnaround Time Process Threshold Percent Runs over Threshold Median
Total TAT 150 Minutes 22.8
UBFcompress lag 30 Minutes 39.0
Transfer Time 60 Minutes 3.8
ACAP Processing 60 Minutes 8.2
med_xfer_time 4.3
med_ACAP_time 34.8
med_ubf_compress 12.0
med_tot_time 106.3
I'd like to save each of the med_* values to a median field matched to the respective process. So the final table should look like
Turnaround Time Process Threshold Percent Runs over Threshold Median
Total TAT 150 Minutes 22.8 106.3
UBFcompress lag 30 Minutes 39.0 12.0
Transfer Time 60 Minutes 3.8 4.3
ACAP Processing 60 Minutes 8.2 34.8
I'm having trouble using the stats command (to get the median values) in conjunction with the transpose command as I can't save the field values of the med_* to a new field (Median).
Any help/tips would be much appreciated!statstabletransposemedianMon, 11 Sep 2017 21:04:02 GMTbyu168168How to get _time at median value?
Now, I encountered hard problem that I can't solve for long times. I was also google on many hours but not result. The problem has following:
I wanna get median a value on search. I had that value by:
source=check_request app="test1" | rename url as "URL" | where URL="/ShippingOrder/Import" | stats median(el) as abc by URL
And I received a result following:
URL abc
/ShippingOrder/Import 29250
Yes, I got median value = `29250`. But I wanna have add more a table that show `_time` at event happen has `abc=29250`. I searched following:
source=check_request app="test1" | rename url as "URL" | where URL="/ShippingOrder/Import" | stats median(el) as abc by URL | table URL abc _time
But table _time is blank.
I found that time by:
source=check_request app="test1" | rename url as "URL" | where URL="/ShippingOrder/Import" | table URL _time el
Result:
URL el _time
/ShippingOrder/Import 29016 2017-09-10 18:08:58
/ShippingOrder/Import 6657 2017-09-10 16:47:58
/ShippingOrder/Import 11656 2017-09-10 16:11:35
/ShippingOrder/Import 23906 2017-09-10 14:46:58
/ShippingOrder/Import 46719 2017-09-10 11:03:56
/ShippingOrder/Import 15016 2017-09-10 16:54:22
/ShippingOrder/Import 29250 2017-09-10 16:46:22
/ShippingOrder/Import 51188 2017-09-10 14:58:22
/ShippingOrder/Import 44000 2017-09-10 14:51:22
/ShippingOrder/Import 12046 2017-09-10 14:42:22
/ShippingOrder/Import 50984 2017-09-10 14:41:22
/ShippingOrder/Import 39735 2017-09-10 14:25:22
And at the time has abc is median value `29250` is `2017-09-10 16:46:22`
So, how to I get result following by some search?
URL abc Time
/ShippingOrder/Import 29250 2017-09-10 16:46:22
timemedianMon, 11 Sep 2017 08:31:23 GMTluanvnhow to do search median
Ihave a question
this is input
date item field_1 field_2 field_3
2016/01/01 x 1 2 3
2016/01/01 y 4 5 6
this I want the output
date item median_field
2016/01/01 x 2
2016/01/01 y 5
medianWed, 21 Jun 2017 16:42:22 GMTthomas22966710How to search the average and median of the number of events per second for each unique username?
I need help with a search.
Let's imagine we have Windows logs. These logs contain the field **Username**.
I want to calculate average and median for number events per second per each unique username.
Strategic goal is to form something like a baseline - "average user generates so much events per hour/day/week". And detect anomalies based on this.
splunk-enterprisesearchtimeaveragemedianTue, 26 Apr 2016 08:11:45 GMTibondaretsHow to write a search to calculate the average and median for a field in my sample data and produce a time chart?
Hi Team,
Am using Splunk for the first time.
I need to calculate the average and Median for the field **rate** which is shown below.
Here's the sample output from my Splunk log:
Thu Dec 17 02:48:52 GMT+00:00 2015 [STATS] bucket-> 6 , 3795 , 25322 , 318 , 240 , 0
Thu Dec 17 02:48:52 GMT+00:00 2015 [STATS] rate-> 7123440
In the search text box, I am specifying the
index=<index_name> source=<source_name>
since the above mentioned pattern is not key=value, I am unable to calculate the average and median for it, but I cannot change the pattern since it is existing.
How to calculate the average and median of this field? Please kindly help.
Your timely intervention really helps me a lot.
timechartaveragemedianMon, 28 Dec 2015 23:45:28 GMTnsrao1983Different median results: fast-mode vs verbose-mode
I'm calculating a median. The result is not the same when I change from fast to verbose mode... Is this expected behaviour?
BR
verbosemedianFri, 08 May 2015 12:37:10 GMTHeinzWaescherHow to filter my search and only return results if response time is greater than median time?
I would like to get results only if response time is greater than median time. I have used below query. But for some reason it shows all the values. Also can you let me know if this is efficient way of doing it?
searchtimefilteringmedianSat, 15 Nov 2014 14:13:55 GMTxvxt006What is the easiest way to get the percentage increase difference between two median values?
This works wonderfully to give me the count and median per server farm, per URL:
<pre>
index=wtf earliest=10/13/2014:10:00:00 latest=10/13/2014:11:00:00 | chart count, exactperc50(time_taken) as median over base_uri by farm
</pre>
percentagedifferencemedianTue, 14 Oct 2014 13:20:27 GMTjundaiWhy is the result of the median function incorrect when the total number of values is an even number?
I think the median calculation is incorrect when the total amount of values is an even number.
An example:
359
282
224
150
60
52
8
1
0
0
Splunk tells me the median is 60, but it should be 56.
In this case there is a pair of middle numbers and the median should be calcualted by median=(60+52)/2
Am I correct?
BR
medianThu, 18 Sep 2014 13:14:04 GMTHeinzWaeschersummary indexing, "si-" commands
I am building up summary indexing for my reports, and while everything is working fine, I have some questions:
1°) How do the sistats median/dc(field) works? I can't find the algorithm used anywhere, and it's clear that it doesn't store the whole distinct values of the field. I can't find the precise documentation on the way those measure are computed (eg:per day) and agregated (eg: per month). (I have checked the doc: [use summary indexing][1], but there is only a rough description of the algorithm used).
2°) How do the overlap command works? I understand that it takes "redundant/ missing" events in an index, but what does it mean exactly (I have read the doc: [configure summary indexing][2]). The trouble I have is how does splunk know if there are missing events or not (how can it tells that events haven't been indexed?)
eg: I have a search that runs every 5 minutes, and use sistats to sum up everything in a summary index. Is there a chance that I run into overlapping/ missing events? (except if splunkd goes down AND/OR search takes more time than the scheduled time range (5mn here))
------------------------------------------------------------------------------------------------
EDIT:
Does anyone has info on this? I am currently seeing a weird behavior using the sistats dc(). When I use it and try to compare it with the dc() I have that does not use summary, I have discrepancies. So I investigate it and when I try to do values(field), some values are clearly missing from the summary index, and I really don't know how it's possible (I have run the fill_summary_index.py script so this shouldn't come from a lack of summarizing.)
Guilhem
[1]: http://docs.splunk.com/Documentation/Splunk/5.0.2/Knowledge/Usesummaryindexing
summary-indexsistatsmedianoverlappingWed, 27 Feb 2013 14:05:59 GMTguilhemstreamstats clarification
I'm using streamstats to calculate the median for a field and timechart to see the count of events where the field has a value less than a median.
... | streamstats median(bytes) as meby|eval snap=if(bytes>=meby, bytes, "smaller") |timechart count by snap
streamstatsmedianMon, 03 Sep 2012 09:52:11 GMTechalex