Hi,
We are pulling in data from a number of db's, via db connect. The data spans many different tables, requiring us to do a number of joins. The response hasn't been acceptable, and we are looking at other options, including creating views within the db's. Our query is below - is there anything obvious that can be improved?
index="perfstats" source="dbmon-tail://TQ/CPUBYVIRTCPU" System=*$Host$*
| eval Actual_Interval_A = Actual_Interval
| eval Virtual_Machine_A=Virtual_Machine
| JOIN type=left System [SEARCH index="perfstats" source="dbmon-tail://TQ/CPUVMSUM" System=*$Host$*]
| JOIN type=left System [SEARCH index="perfstats" source="dbmon-tail://TQ/VMVMS" System=*$Host$*]
| JOIN type=left System [SEARCH index="perfstats" source="dbmon-tail://TQ/MEMBYVM" System=*$Host$*]
| JOIN type=left System [ SEARCH index="perfstats" source="*BLKDEVBYVM" System=*$Host$*
| eval Actual_Interval_B=Actual_Interval ]
| JOIN type=left System [SEARCH index="perfstats" source="dbmon-tail://TQ/HINVSUM" earliest=-1d System=*$Host$*| eval System=System]
| rename System as host
| JOIN type=left host [SEARCH index="perfstats" source="*VMHOSTCONF" earliest=-7d Cluster!=P* Cluster!=N*
| eval host=lower(Host)]
| stats sum(reads) AS "io_reads" , sum(writes) AS "io_writes",sum(eval(KB_read_s/Actual_Interval_B)) as "kb_reads", sum(eval(KB_write_s/Actual_Interval_B)) as "kb_writes",avg(avgReadLatency) AS "diskresp_reads", avg(avgWriteLatency) AS "diskresp_writes",avg(avgQueueReadLatency) AS "diskqueue_reads", avg(avgQueueWriteLatency) AS "diskqueue_writes", sum(Actual_Interval_B ) as "interval" distinct_count(Virtual_Machine_A) AS "#VMs" sum(ready) as "r" sum(Actual_Interval_A) as "t" sparkline(sum(eval(ready/Actual_Interval))) as "CPU Trend" avg(VCPU_Count) as "vcpu" avg(online_cpus) as "c" avg(online_cpus_physical) as "cp" avg(busy) as "busy" avg(consumed) AS "memCons", avg(size) AS "memGran", avg(totalCapacity) AS "memInst" by Cluster,host
| eval "CPU Ready/Wait"=round((r/t)*100,2)
| eval "# of Cores"=ceil((c+cp))
| eval "#VCPUs"=ceil(vcpu)
| eval "% busy"=round(busy,2)
| eval "Memory Consumed (GB)"=round(memCons/1024,2)
| eval "Memory Granted (GB)"=round(memGran/1024,2)
| eval "Memory Installed (GB)"=round(memInst/1024,2)
| eval "Response Time Reads"=round(diskresp_reads,2)
| eval "Response Time Writes"=round(diskresp_writes,2)
| eval "Queue Reads"=round(diskqueue_reads,2)
| eval "Queue Writes"=round(diskqueue_writes,2)
| eval "IO Reads"= round((io_reads/interval),2)
| eval "IO Writes"=round((io_writes/interval),2)
| eval "IO Total" = ($IO Reads$)+($IO Writes$)
| eval "Throughput Reads (MBs/second)"=round((kb_reads/(interval*1024)),4)
| eval "Throughput Writes (MBs/second)"=round((kb_writes/(interval*1024)),4)
| eval "Throughput Total (MBs/second)" = ($Throughput Reads (MBs/second)$)+($Throughput Writes (MBs/second)$)
| eval "Response Time Total"= ($Response Time Reads$)+($Response Time Writes$)
| eval "Queue Total"= ($Queue Reads$)+($Queue Writes$)
| table host,"CPU Ready/Wait","CPU Trend", "#VCPUs","VMotions","# of Cores", "% busy", "Memory Consumed (GB)", "Memory Granted (GB)", "Memory Installed (GB)", "#VMs" "IO Reads" "IO Writes" "IO Total" "Throughput Reads (MBs/second)" "Throughput Writes (MBs/second)" "Throughput Total (MBs/second)" "Response Time Reads" "Response Time Writes" "Response Time Total" "Queue Reads" "Queue Writes" "Queue Total"
| sort - "CPU Ready/Wait"
Great answers and comments so far, but one thing I would say, which probably will have no effect on search performance, is you should seriously consider moving some of those evals from the search to calculated fields. This is more about knowledge management than performance.
Maybe some of those fields are specific only to the report you are running, but I promise you one day your colleagues will want to use the same evals you have come up with in that search. Give them a head start and don't make them figure out "IO Total" or "CPU/Ready Wait".
http://docs.splunk.com/Documentation/Splunk/5.0.3/Knowledge/definecalcfields
the join
command should be used only as a last resort, and you can eliminate most of these joins by writing the search as a disjunction, and by applying a little cleverness with stats
and with conditional eval
commands.
Try playing around with this, and with a stats command on the end to zip up all the bits and pieces by the distinct values of the system field.
index="perfstats"
(source="dbmon-tail://TQ/CPUBYVIRTCPU" OR source="dbmon-tail://TQ/CPUVMSUM" OR
source="dbmon-tail://TQ/VMVMS" OR source="dbmon-tail://TQ/MEMBYVM" OR source="*BLKDEVBYVM" )
System=*$Host$*
... some stats command to roll up the "last(someField) as someField" that you need, ending in "by System"...
| JOIN type=left System [SEARCH index="perfstats" source="dbmon-tail://TQ/HINVSUM" earliest=-1d System=*$Host$*| eval System=System]
| rename System as host
| JOIN type=left host [SEARCH index="perfstats" source="*VMHOSTCONF" earliest=-7d Cluster!=P* Cluster!=N*
| eval host=lower(Host)]
You'll also need a little conditional eval to get the VirtualMachine fields set correctly. Anyway, even leaving those last two joins in there, your search performance will be greatly improved... It's far faster to get data off disk in one fell swoop with a big disjunction and then massage things as necessary with eval and group with stats, than to rely heavily on join.
Also, depending on the timerange of the overall search, it may well be a net positive to roll the last two joins into the pipeline as well, and just do the time filtering on the fly. Look at the if()
, case()
and match()
functions inside the eval
command
quick examples of what I'm talking about:
| eval VirtualMachine=if(case(source="dbmon-tail://TQ/CPUVMSUM"),VirtualMachine_A,1==1,VirtualMachine_B)
and
| where _time>relative_time(now(),"-7d")
http://docs.splunk.com/Documentation/Splunk/5.0.3/SearchReference/CommonEvalFunctions
Thanks. I'll take this and give it a try.
Since you're using many subsearches, have read this answer about subsearch performance?
http://splunk-base.splunk.com/answers/5871/subsearch-performance-optimization