I have write the below query , Can someone rewrite the query in more efficient way.
Basically I am trying to see breakup of the request as it flow from UI to 16 different micro service layer and than reaches to elastic search. So within the single graph I am plotting the time chart . The query is working fine except it is too slow and long. the source filed is the different log locations, All GET request are action names logged in the logs. I am also appending 2 queries since the format of the logs slightly different.
(index=test1 OR index=test2) AND (COMP_Environment="**prod**" OR COMP_Stack_Name="**prod**") source="/var/log/*comp-ui*" OR source="/var/log/*utp-svc*" ( request_url="*utp-svc*" ) OR ((event_type="API" OR EventType="API" OR event_type="REPOSITORY") AND (GET_CATEGORIES OR GET_CATEGORIES_FROM_CAT_SERVICE OR GET_CAT_TEMPLATE OR GET_CAT_SILO_MAIN_FROM_CONTENT_SERVICE OR GET_BREADCRUMB_FROM_NAV_SERVICE OR GET_CAT_SILO_GRAPHIC_HEADER_FROM_CONT_SERVICE OR GET_DESIG_INDEX_FROM_REDIS_CACHE OR GET_CAT_DOCUMENTS_FROM_EC OR GET_FAVORITE_DESIGS_FROM_PROFILE_SERVICE OR GET_CAT_BANNER_FROM_CONTENT_SERVICE OR GET_CAT_COLLECTIONS_FROM_CONTENT_SERVICE OR GET_PRODUCTS_BY_CAT_ID_API ))
| eval action_event=event_type+"_"+action
| timechart span="1m" avg(responseTime) as "Server" avg(duration_millis) by action_event
| fields + _time "Server*" "*GET_CATEGORIES" "*GET_CATEGORIES_FROM_CAT_SERVICE" "*GET_CAT_TEMPLATE" "*GET_CAT_SILO_MAIN_FROM_CONTENT_SERVICE" "*GET_BREADCRUMB_FROM_NAV_SERVICE" "*GET_CAT_SILO_GRAPHIC_HEADER_FROM_CONT_SERVICE" "*GET_DESIG_INDEX_FROM_REDIS_CACHE" "*GET_CAT_DOCUMENTS_FROM_EC" "*GET_FAVORITE_DESIGS_FROM_PROFILE_SERVICE" "*GET_CAT_BANNER_FROM_CONTENT_SERVICE" "*GET_CAT_COLLECTIONS_FROM_CONTENT_SERVICE" "*GET_PRODUCTS_BY_CAT_ID_API"
| rename "avg(duration_millis): API_GET_CAT_TEMPLATE" as "(3.1) - CAT API ", "Server: NULL" as "(1) - utp-svc UI", "avg(duration_millis): REPOSITORY_GET_CATEGORIES_FROM_CAT_SERVICE" as "(3.1.1) - CAT REPOSITORY ", "avg(duration_millis): REPOSITORY_GET_CAT_BANNER_FROM_CONTENT_SERVICE" as "(4) - CAT_BANNER", "avg(duration_millis): REPOSITORY_GET_CAT_SILO_MAIN_FROM_CONTENT_SERVICE" as "(5) - CAT_SILO","avg(duration_millis): REPOSITORY_GET_BREADCRUMB_FROM_NAV_SERVICE" as "(6) - BREADCRUMB","avg(duration_millis): REPOSITORY_GET_CAT_SILO_GRAPHIC_HEADER_FROM_CONT_SERVICE" as "(7) - SILO_GRAPHIC_HEADER","avg(duration_millis): REPOSITORY_GET_DESIG_INDEX_FROM_REDIS_CACHE" as "(8) - REDIS_CACHE","avg(duration_millis): REPOSITORY_GET_FAVORITE_DESIGS_FROM_PROFILE_SERVICE" as "(3) - PROFILE_SERVICE","avg(duration_millis): REPOSITORY_GET_CAT_COLLECTIONS_FROM_CONTENT_SERVICE" as "(9) - CONTENT_SERVICE","avg(duration_millis): API_GET_CATEGORIES" as "(3.1.2) - GET_CATEGORIES","avg(duration_millis): REPOSITORY_GET_CAT_DOCUMENTS_FROM_EC" as "(3.1.2) - GET_CAT_DOCUMENTS_FROM_EC","avg(duration_millis): API_GET_PRODUCTS_BY_CAT_ID_API" as "(3.4) - GET_PRODUCTS_BY_CAT_ID_API"
| append
[ search (index=test1 OR index=test2) source="/var/log/*comp-ui*" requestOrigin=external (stack_name="*prod*" OR COMP_Stack_Name="*prod*") (requestUrl="/search.jsp*" OR requestUrl="*/c/*" OR requestUrl="*/p/*" OR requestUrl="*c.cat*") NOT userAgent="*HealthChecker*"
| dedup _raw
| eval rt=responseTime
| rex field=requestUrl "((?<file>\/search\.jsp|p\.prod|\/p\/|\/c\/|c\.cat|index\.jsp|addToCart|mini-cart|^\/$))"
| timechart avg(rt) by file
| sort 0 - "avg(rt)"]
| append
[ search (index=test1 OR index=test2) source="*product-service*" COMP_Environment="prod" AND (stack_name=*prod* OR COMP_Stack_Name="*prod*") (GET_PROD_QUERY_TOOK_TIME_MS )
| eval duration_millis=GET_PRODS_QUERY_TOOK_TIME_MS
| timechart span="1m" avg(duration_millis) as "(7) -Elastic Search"]
| append
[ search (index=test1 OR index=test2) source="/var/log/*comp-ui*" (stack_name="*prod*" OR COMP_Stack_Name="*prod*") (request_url="*/utp-svc/*" OR request_url="*/PL-SVC/*" OR request_url="*/leftnav/*" OR request_url="*/siteticker*" OR request_url="*/banner*" OR request_url="*/promot*" OR request_url="*/brandlinks*" OR request_url="*/breadcrumbs*" OR request_url="*/silos/US/desktop*" OR request_url="*/silos/US/initial*" OR request_url="*/abtest/assignments?mboxIds*" OR request_url="*/r_main_drawer_promo.html*" OR request_url="*/r_cusp_drawer_promo.html" OR request_url="*endeca/r_no_results.html" OR request_url="*/p_navaux.html*" ) NOT userAgent="*HealthChecker*"
| dedup _raw
| eval rt=responseTime
| rex field=request_url "((?<file>\/siteticker|\/banner|\/promot|\/leftnavt|\/PL-SVC|\/silos/US/desktop|\/utp-svc|\/breadcrumbs|\/abtest|cusp_drawer_promo|main_drawer_promo|endeca|p_navaux\.html|\/brandlinks|^\/$))"
| timechart avg(rt) by file
| sort 0 - "avg(rt)"]
In the first place, you are mixing AND
and OR
clauses without out fully-parenthesizing them. This isn't just a matter of efficiency, but a matter of correctness.
I took the liberty of structuring the search code in your question a bit to make it more readable.
Couple of questions:
- what is the purpose of that fields + ...
and rename
part on lines 4 and 5?
- what is the purpose of the dedup(_raw)
on line 19?
what is the purpose of that fields + ... and rename part on lines 4 and 5?
-Since I have used eval action_event=event_type+"_"+action , the output of eval will be something like this action_event = REPOSITORY_GET_CAT_BANNER_FROM_CONTENT_SERVICE and then when I am using fields + it will give me output something like this "avg(duration_millis): REPOSITORY_GET_CAT_BANNER_FROM_CONTENT_SERVICE " which I am renaming to "avg(duration_millis): REPOSITORY_GET_CAT_BANNER_FROM_CONTENT_SERVICE" as "(1) -CAT_BANNER"
what is the purpose of the dedup(_raw) on line 19?
Dedup I have removed
There are dedup _raw
scattered everywhere, and this whole thing is a pile of appends from a single basic source. Which makes me believe
1) that it might be refactored into a "no append" version that works better.
2) that there's a problem with duplicated logs, which if we could clean that up might help a ton.
Can you paste in some of the output this provides, so we can see what the end result is like?
Also, I find it amusing to ask - why aren't you doing this in Elastic Search? 🙂
Oh, also,
(index=test1 OR index=test2) source="/var/log/*comp-ui*" requestOrigin=external (stack_name="*prod*" OR COMP_Stack_Name="*prod*") (requestUrl="/search.jsp*" OR requestUrl="*/c/*" OR requestUrl="*/p/*" OR requestUrl="*c.cat*") NOT userAgent="*HealthChecker*"
| stats count by _raw | sort -
What does that output? What's the top few numbers, and are they ALL like that or just a few that are duplicated?