Dashboards & Visualizations

Performance very slow for index join on 200MB data only

jonu4u
New Member

Hi I've json data in one index containing student_id,grade_id,class_id . and another static dataset(csv) in another index which contains a the names based on ids. Say student_name,student_id,grade_name,grade_id.
I need to show a dashboard in realtime which will update the count of students per grade per class , however the filter are on the names and not the ids. So I need to join these two datasets. The grade_id,class_id are inside nested json and are extracted at search time before they are joined with the static index. This query is taking 20seconds!!! too slow. What techniques can be done to make it under 2-3seconds since data volume is very small.

0 Karma

woodcock
Esteemed Legend

Even better, I would use a scheduled search to turn the 2nd CSV dataset back into a lookup like this:

(index=<student data> AND sourcetype=<student data>)
| dedup student_name student_id grade_name grade_id
| table student_name student_id grade_name grade_id
| outputlookup MyStudentLookup.csv

Then do this:

 index=<json data> AND sourcetype=<json data>
| lookup MyStudentLookup.csv student_id grade_id
0 Karma

woodcock
Esteemed Legend

This is one of the main reasons you keep hearing us say never use join; instead use stats like this:

(index=<json data> AND sourcetype=<json data> OR (index=<student data> AND sourcetype=<student data>)
| stats values(*) AS * BY student_id grade_id

Then do your stuff from here.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please share your search.
Also, please explain why a dashboard needs to be real-time. Is the number of students in a class changing so fast a 20-second query can't keep up? If a human is processing the results of the query then real-time is not necessary.

Perhaps by "realtime" you mean "fast". That's different and by seeing your query we may be able to suggest improvements.
What is the size of the "very small" data volume?

---
If this reply helps you, Karma would be appreciated.
0 Karma

jonu4u
New Member

So we need to monitor student activity. So any instant if a student is inactive for less than 5 seconds we see that. So we want it realtime.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...