Splunk Search

Search across nested json to get employee details for different dates

raunakp
Engager

I have a response from one of the client application like this:

{ "employees": { "2023-03-16": { "1": { "id": 1, "name": "Michael Scott", "email": "demo@desktime.com", "groupId": 1, "group": "Accounting", "profileUrl": "url.com", "isOnline": false, "arrived": false, "left": false, "late": false, "onlineTime": 0, "offlineTime": 0, "desktimeTime": 0, "atWorkTime": 0, "afterWorkTime": 0, "beforeWorkTime": 0, "productiveTime": 0, "productivity": 0, "efficiency": 0, "work_starts": "23:59:59", "work_ends": "00:00:00", "notes": { "Skype": "Find.me", "Slack": "MichielS" }, "activeProject": [] }, "2": { "id": 2, "name": "Andy Bernard", "email": "demo3@desktime.com", "groupId": 106345, "group": "Marketing", "profileUrl": "url.com", "isOnline": true, "arrived": "2023-03-16 09:17:00", "left": "2023-03-16 10:58:00", "late": true, "onlineTime": 6027, "offlineTime": 0, "desktimeTime": 6027, "atWorkTime": 6060, "afterWorkTime": 0, "beforeWorkTime": 0, "productiveTime": 4213, "productivity": 69.9, "efficiency": 14.75, "work_starts": "09:00:00", "work_ends": "18:00:00", "notes": { "Background": "Law and accounting" }, "activeProject": { "project_id": 67973, "project_title": "Blue Book", "task_id": 42282, "task_title": "Blue Book task", "duration": 6027 } }..... } "__request_time": "1678957028" }

 I am facing problem with the date field "2023-03-16" as this field changes everyday. I wanted to create statistics based on all Employee IDs, Late employees, Email etc for last 7 days. I have used Spath  but cannot use wildcard search on all Late employees on all days. Thanks

Labels (3)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

Maybe you can explain what is the significance of that top level key "2023-03-16"?  And is the phrase   There ought to be some semantic significance that it changes every day.  Also, is that the only top level key?  If so, I'd say that the developers are making a poor design.  Same can be said about the second level key ("1", "2", ...), which seems to be semantically redundant as third-level key "id".  If you have any influence over developers, maybe suggest that they get rid of second-level key, and just make an array with the 3rd level objects.

Anyway, I am not going to assume any semantic significance of the top level key(s?) for now.  I also assume that your desire to use wildcard search is about search in those 3rd level keys such as "name" and "email".  To my understanding, you want a simple search such as name = "michael *" late = "true" without having to confront field names such as 2023-03-16.1.name.

This is something you can try:

 

 

| spath path=employees
| eval date = json_array_to_mv(json_keys(employees))
| mvexpand date
| eval day_employees = json_extract(employees, date)
| eval employee_id = json_array_to_mv(json_keys(day_employees))
| mvexpand employee_id
| eval day_employees = json_extract(day_employees, employee_id)
| spath input=day_employees

 

 

Your sample data (single date, two id) would give these field values: (There are too many fields so the following is transposed.)

fieldname12
_mkv_child01
activeProject.duration 6027
activeProject.project_id 67973
activeProject.project_title Blue Book
activeProject.task_id 42282
activeProject.task_title Blue Book task
afterWorkTime00
arrivedfalse2023-03-16 09:17:00
atWorkTime06060
beforeWorkTime00
date2023-03-162023-03-16
desktimeTime06027
efficiency014.75
emaildemo@desktime.comdemo3@desktime.com
employee_id12
groupAccountingMarketing
groupId1106345
isOnlinefalsetrue
latefalsetrue
leftfalse2023-03-16 10:58:00
nameMichael ScottAndy Bernard
notes.Background Law and accounting
notes.SkypeFind.me 
notes.SlackMichielS 
offlineTime00
onlineTime06027
productiveTime04213
productivity069.9
profileUrlurl.comurl.com
work_ends00:00:0018:00:00
work_starts23:59:5909:00:00

Here is an emulation that you can play with and compare with real data

 

| makeresults
| eval _raw = "{ \"employees\": { \"2023-03-16\": { \"1\": { \"id\": 1, \"name\": \"Michael Scott\", \"email\": \"demo@desktime.com\", \"groupId\": 1, \"group\": \"Accounting\", \"profileUrl\": \"url.com\", \"isOnline\": false, \"arrived\": false, \"left\": false, \"late\": false, \"onlineTime\": 0, \"offlineTime\": 0, \"desktimeTime\": 0, \"atWorkTime\": 0, \"afterWorkTime\": 0, \"beforeWorkTime\": 0, \"productiveTime\": 0, \"productivity\": 0, \"efficiency\": 0, \"work_starts\": \"23:59:59\", \"work_ends\": \"00:00:00\", \"notes\": { \"Skype\": \"Find.me\", \"Slack\": \"MichielS\" }, \"activeProject\": [] }, \"2\": { \"id\": 2, \"name\": \"Andy Bernard\", \"email\": \"demo3@desktime.com\", \"groupId\": 106345, \"group\": \"Marketing\", \"profileUrl\": \"url.com\", \"isOnline\": true, \"arrived\": \"2023-03-16 09:17:00\", \"left\": \"2023-03-16 10:58:00\", \"late\": true, \"onlineTime\": 6027, \"offlineTime\": 0, \"desktimeTime\": 6027, \"atWorkTime\": 6060, \"afterWorkTime\": 0, \"beforeWorkTime\": 0, \"productiveTime\": 4213, \"productivity\": 69.9, \"efficiency\": 14.75, \"work_starts\": \"09:00:00\", \"work_ends\": \"18:00:00\", \"notes\": { \"Background\": \"Law and accounting\" }, \"activeProject\": { \"project_id\": 67973, \"project_title\": \"Blue Book\", \"task_id\": 42282, \"task_title\": \"Blue Book task\", \"duration\": 6027 } } } }, \"__request_time\": \"1678957028\" }"
``` data emulation above ```

 

View solution in original post

0 Karma

raunakp
Engager

Thanks for your response. Yes it is indeed a bad design.

There is only one parent node "2023-03-16". This changes everyday "2023-03-16, 17, 18....." but the associated fields under this date that is employee id(1, 2,.....) remains the same. The problem arises when we create analytics/alerts for let say last 7 days. The top level key changes.

Your response worked but I am getting all the events of "all time" even if I have selected a timestamp of 24h.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

It sounds like that each event is a summary report for a day.

Your response worked but I am getting all the events of "all time" even if I have selected a timestamp of 24h.

Do you mean that when your time selector is for last 24 hours, Splunk returns multiple daily summaries?  If _time and the date key do not agree, and if your intention is to search for those summaries that fall within your search window, you can filter by that key, e.g.,

| spath path=employees
| eval date = json_array_to_mv(json_keys(employees))
| mvexpand date ``` skip this if each employees record has only one top level key ```
| addinfo
| eval date_start = strptime(date, "%F")
| where info_min_time <= date_start AND relative_time(date_start, "+1d") < info_max_time
| eval day_employees = json_extract(employees, date)
| eval employee_id = json_array_to_mv(json_keys(day_employees))
| mvexpand employee_id
| eval day_employees = json_extract(day_employees, employee_id)
| spath input=day_employees

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Maybe you can explain what is the significance of that top level key "2023-03-16"?  And is the phrase   There ought to be some semantic significance that it changes every day.  Also, is that the only top level key?  If so, I'd say that the developers are making a poor design.  Same can be said about the second level key ("1", "2", ...), which seems to be semantically redundant as third-level key "id".  If you have any influence over developers, maybe suggest that they get rid of second-level key, and just make an array with the 3rd level objects.

Anyway, I am not going to assume any semantic significance of the top level key(s?) for now.  I also assume that your desire to use wildcard search is about search in those 3rd level keys such as "name" and "email".  To my understanding, you want a simple search such as name = "michael *" late = "true" without having to confront field names such as 2023-03-16.1.name.

This is something you can try:

 

 

| spath path=employees
| eval date = json_array_to_mv(json_keys(employees))
| mvexpand date
| eval day_employees = json_extract(employees, date)
| eval employee_id = json_array_to_mv(json_keys(day_employees))
| mvexpand employee_id
| eval day_employees = json_extract(day_employees, employee_id)
| spath input=day_employees

 

 

Your sample data (single date, two id) would give these field values: (There are too many fields so the following is transposed.)

fieldname12
_mkv_child01
activeProject.duration 6027
activeProject.project_id 67973
activeProject.project_title Blue Book
activeProject.task_id 42282
activeProject.task_title Blue Book task
afterWorkTime00
arrivedfalse2023-03-16 09:17:00
atWorkTime06060
beforeWorkTime00
date2023-03-162023-03-16
desktimeTime06027
efficiency014.75
emaildemo@desktime.comdemo3@desktime.com
employee_id12
groupAccountingMarketing
groupId1106345
isOnlinefalsetrue
latefalsetrue
leftfalse2023-03-16 10:58:00
nameMichael ScottAndy Bernard
notes.Background Law and accounting
notes.SkypeFind.me 
notes.SlackMichielS 
offlineTime00
onlineTime06027
productiveTime04213
productivity069.9
profileUrlurl.comurl.com
work_ends00:00:0018:00:00
work_starts23:59:5909:00:00

Here is an emulation that you can play with and compare with real data

 

| makeresults
| eval _raw = "{ \"employees\": { \"2023-03-16\": { \"1\": { \"id\": 1, \"name\": \"Michael Scott\", \"email\": \"demo@desktime.com\", \"groupId\": 1, \"group\": \"Accounting\", \"profileUrl\": \"url.com\", \"isOnline\": false, \"arrived\": false, \"left\": false, \"late\": false, \"onlineTime\": 0, \"offlineTime\": 0, \"desktimeTime\": 0, \"atWorkTime\": 0, \"afterWorkTime\": 0, \"beforeWorkTime\": 0, \"productiveTime\": 0, \"productivity\": 0, \"efficiency\": 0, \"work_starts\": \"23:59:59\", \"work_ends\": \"00:00:00\", \"notes\": { \"Skype\": \"Find.me\", \"Slack\": \"MichielS\" }, \"activeProject\": [] }, \"2\": { \"id\": 2, \"name\": \"Andy Bernard\", \"email\": \"demo3@desktime.com\", \"groupId\": 106345, \"group\": \"Marketing\", \"profileUrl\": \"url.com\", \"isOnline\": true, \"arrived\": \"2023-03-16 09:17:00\", \"left\": \"2023-03-16 10:58:00\", \"late\": true, \"onlineTime\": 6027, \"offlineTime\": 0, \"desktimeTime\": 6027, \"atWorkTime\": 6060, \"afterWorkTime\": 0, \"beforeWorkTime\": 0, \"productiveTime\": 4213, \"productivity\": 69.9, \"efficiency\": 14.75, \"work_starts\": \"09:00:00\", \"work_ends\": \"18:00:00\", \"notes\": { \"Background\": \"Law and accounting\" }, \"activeProject\": { \"project_id\": 67973, \"project_title\": \"Blue Book\", \"task_id\": 42282, \"task_title\": \"Blue Book task\", \"duration\": 6027 } } } }, \"__request_time\": \"1678957028\" }"
``` data emulation above ```

 

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...