Solved: Search across nested json to get employee details ...

raunakp · ‎10-25-2023

I have a response from one of the client application like this:

{ "employees": { "2023-03-16": { "1": { "id": 1, "name": "Michael Scott", "email": "demo@desktime.com", "groupId": 1, "group": "Accounting", "profileUrl": "url.com", "isOnline": false, "arrived": false, "left": false, "late": false, "onlineTime": 0, "offlineTime": 0, "desktimeTime": 0, "atWorkTime": 0, "afterWorkTime": 0, "beforeWorkTime": 0, "productiveTime": 0, "productivity": 0, "efficiency": 0, "work_starts": "23:59:59", "work_ends": "00:00:00", "notes": { "Skype": "Find.me", "Slack": "MichielS" }, "activeProject": [] }, "2": { "id": 2, "name": "Andy Bernard", "email": "demo3@desktime.com", "groupId": 106345, "group": "Marketing", "profileUrl": "url.com", "isOnline": true, "arrived": "2023-03-16 09:17:00", "left": "2023-03-16 10:58:00", "late": true, "onlineTime": 6027, "offlineTime": 0, "desktimeTime": 6027, "atWorkTime": 6060, "afterWorkTime": 0, "beforeWorkTime": 0, "productiveTime": 4213, "productivity": 69.9, "efficiency": 14.75, "work_starts": "09:00:00", "work_ends": "18:00:00", "notes": { "Background": "Law and accounting" }, "activeProject": { "project_id": 67973, "project_title": "Blue Book", "task_id": 42282, "task_title": "Blue Book task", "duration": 6027 } }..... } "__request_time": "1678957028" }

I am facing problem with the date field "2023-03-16" as this field changes everyday. I wanted to create statistics based on all Employee IDs, Late employees, Email etc for last 7 days. I have used Spath but cannot use wildcard search on all Late employees on all days. Thanks

yuanliu · ‎10-29-2023

Maybe you can explain what is the significance of that top level key "2023-03-16"? And is the phrase There ought to be some semantic significance that it changes every day. Also, is that the only top level key? If so, I'd say that the developers are making a poor design. Same can be said about the second level key ("1", "2", ...), which seems to be semantically redundant as third-level key "id". If you have any influence over developers, maybe suggest that they get rid of second-level key, and just make an array with the 3rd level objects.

Anyway, I am not going to assume any semantic significance of the top level key(s?) for now. I also assume that your desire to use wildcard search is about search in those 3rd level keys such as "name" and "email". To my understanding, you want a simple search such as name = "michael *" late = "true" without having to confront field names such as 2023-03-16.1.name.

This is something you can try:

| spath path=employees
| eval date = json_array_to_mv(json_keys(employees))
| mvexpand date
| eval day_employees = json_extract(employees, date)
| eval employee_id = json_array_to_mv(json_keys(day_employees))
| mvexpand employee_id
| eval day_employees = json_extract(day_employees, employee_id)
| spath input=day_employees

Your sample data (single date, two id) would give these field values: (There are too many fields so the following is transposed.)

fieldname	1	2
_mkv_child	0	1
activeProject.duration		6027
activeProject.project_id		67973
activeProject.project_title		Blue Book
activeProject.task_id		42282
activeProject.task_title		Blue Book task
afterWorkTime	0	0
arrived	false	2023-03-16 09:17:00
atWorkTime	0	6060
beforeWorkTime	0	0
date	2023-03-16	2023-03-16
desktimeTime	0	6027
efficiency	0	14.75
email	demo@desktime.com	demo3@desktime.com
employee_id	1	2
group	Accounting	Marketing
groupId	1	106345
isOnline	false	true
late	false	true
left	false	2023-03-16 10:58:00
name	Michael Scott	Andy Bernard
notes.Background		Law and accounting
notes.Skype	Find.me
notes.Slack	MichielS
offlineTime	0	0
onlineTime	0	6027
productiveTime	0	4213
productivity	0	69.9
profileUrl	url.com	url.com
work_ends	00:00:00	18:00:00
work_starts	23:59:59	09:00:00

Here is an emulation that you can play with and compare with real data

| makeresults
| eval _raw = "{ \"employees\": { \"2023-03-16\": { \"1\": { \"id\": 1, \"name\": \"Michael Scott\", \"email\": \"demo@desktime.com\", \"groupId\": 1, \"group\": \"Accounting\", \"profileUrl\": \"url.com\", \"isOnline\": false, \"arrived\": false, \"left\": false, \"late\": false, \"onlineTime\": 0, \"offlineTime\": 0, \"desktimeTime\": 0, \"atWorkTime\": 0, \"afterWorkTime\": 0, \"beforeWorkTime\": 0, \"productiveTime\": 0, \"productivity\": 0, \"efficiency\": 0, \"work_starts\": \"23:59:59\", \"work_ends\": \"00:00:00\", \"notes\": { \"Skype\": \"Find.me\", \"Slack\": \"MichielS\" }, \"activeProject\": [] }, \"2\": { \"id\": 2, \"name\": \"Andy Bernard\", \"email\": \"demo3@desktime.com\", \"groupId\": 106345, \"group\": \"Marketing\", \"profileUrl\": \"url.com\", \"isOnline\": true, \"arrived\": \"2023-03-16 09:17:00\", \"left\": \"2023-03-16 10:58:00\", \"late\": true, \"onlineTime\": 6027, \"offlineTime\": 0, \"desktimeTime\": 6027, \"atWorkTime\": 6060, \"afterWorkTime\": 0, \"beforeWorkTime\": 0, \"productiveTime\": 4213, \"productivity\": 69.9, \"efficiency\": 14.75, \"work_starts\": \"09:00:00\", \"work_ends\": \"18:00:00\", \"notes\": { \"Background\": \"Law and accounting\" }, \"activeProject\": { \"project_id\": 67973, \"project_title\": \"Blue Book\", \"task_id\": 42282, \"task_title\": \"Blue Book task\", \"duration\": 6027 } } } }, \"__request_time\": \"1678957028\" }"
``` data emulation above ```

View solution in original post

raunakp · ‎11-03-2023

Thanks for your response. Yes it is indeed a bad design.

There is only one parent node "2023-03-16". This changes everyday "2023-03-16, 17, 18....." but the associated fields under this date that is employee id(1, 2,.....) remains the same. The problem arises when we create analytics/alerts for let say last 7 days. The top level key changes.

Your response worked but I am getting all the events of "all time" even if I have selected a timestamp of 24h.

yuanliu · ‎11-03-2023

It sounds like that each event is a summary report for a day.

Your response worked but I am getting all the events of "all time" even if I have selected a timestamp of 24h.

Do you mean that when your time selector is for last 24 hours, Splunk returns multiple daily summaries? If _time and the date key do not agree, and if your intention is to search for those summaries that fall within your search window, you can filter by that key, e.g.,

| spath path=employees
| eval date = json_array_to_mv(json_keys(employees))
| mvexpand date ``` skip this if each employees record has only one top level key ```
| addinfo
| eval date_start = strptime(date, "%F")
| where info_min_time <= date_start AND relative_time(date_start, "+1d") < info_max_time
| eval day_employees = json_extract(employees, date)
| eval employee_id = json_array_to_mv(json_keys(day_employees))
| mvexpand employee_id
| eval day_employees = json_extract(day_employees, employee_id)
| spath input=day_employees

yuanliu · ‎10-29-2023

Maybe you can explain what is the significance of that top level key "2023-03-16"? And is the phrase There ought to be some semantic significance that it changes every day. Also, is that the only top level key? If so, I'd say that the developers are making a poor design. Same can be said about the second level key ("1", "2", ...), which seems to be semantically redundant as third-level key "id". If you have any influence over developers, maybe suggest that they get rid of second-level key, and just make an array with the 3rd level objects.

Anyway, I am not going to assume any semantic significance of the top level key(s?) for now. I also assume that your desire to use wildcard search is about search in those 3rd level keys such as "name" and "email". To my understanding, you want a simple search such as name = "michael *" late = "true" without having to confront field names such as 2023-03-16.1.name.

This is something you can try:

| spath path=employees
| eval date = json_array_to_mv(json_keys(employees))
| mvexpand date
| eval day_employees = json_extract(employees, date)
| eval employee_id = json_array_to_mv(json_keys(day_employees))
| mvexpand employee_id
| eval day_employees = json_extract(day_employees, employee_id)
| spath input=day_employees

Your sample data (single date, two id) would give these field values: (There are too many fields so the following is transposed.)

fieldname	1	2
_mkv_child	0	1
activeProject.duration		6027
activeProject.project_id		67973
activeProject.project_title		Blue Book
activeProject.task_id		42282
activeProject.task_title		Blue Book task
afterWorkTime	0	0
arrived	false	2023-03-16 09:17:00
atWorkTime	0	6060
beforeWorkTime	0	0
date	2023-03-16	2023-03-16
desktimeTime	0	6027
efficiency	0	14.75
email	demo@desktime.com	demo3@desktime.com
employee_id	1	2
group	Accounting	Marketing
groupId	1	106345
isOnline	false	true
late	false	true
left	false	2023-03-16 10:58:00
name	Michael Scott	Andy Bernard
notes.Background		Law and accounting
notes.Skype	Find.me
notes.Slack	MichielS
offlineTime	0	0
onlineTime	0	6027
productiveTime	0	4213
productivity	0	69.9
profileUrl	url.com	url.com
work_ends	00:00:00	18:00:00
work_starts	23:59:59	09:00:00

Here is an emulation that you can play with and compare with real data

| makeresults
| eval _raw = "{ \"employees\": { \"2023-03-16\": { \"1\": { \"id\": 1, \"name\": \"Michael Scott\", \"email\": \"demo@desktime.com\", \"groupId\": 1, \"group\": \"Accounting\", \"profileUrl\": \"url.com\", \"isOnline\": false, \"arrived\": false, \"left\": false, \"late\": false, \"onlineTime\": 0, \"offlineTime\": 0, \"desktimeTime\": 0, \"atWorkTime\": 0, \"afterWorkTime\": 0, \"beforeWorkTime\": 0, \"productiveTime\": 0, \"productivity\": 0, \"efficiency\": 0, \"work_starts\": \"23:59:59\", \"work_ends\": \"00:00:00\", \"notes\": { \"Skype\": \"Find.me\", \"Slack\": \"MichielS\" }, \"activeProject\": [] }, \"2\": { \"id\": 2, \"name\": \"Andy Bernard\", \"email\": \"demo3@desktime.com\", \"groupId\": 106345, \"group\": \"Marketing\", \"profileUrl\": \"url.com\", \"isOnline\": true, \"arrived\": \"2023-03-16 09:17:00\", \"left\": \"2023-03-16 10:58:00\", \"late\": true, \"onlineTime\": 6027, \"offlineTime\": 0, \"desktimeTime\": 6027, \"atWorkTime\": 6060, \"afterWorkTime\": 0, \"beforeWorkTime\": 0, \"productiveTime\": 4213, \"productivity\": 69.9, \"efficiency\": 14.75, \"work_starts\": \"09:00:00\", \"work_ends\": \"18:00:00\", \"notes\": { \"Background\": \"Law and accounting\" }, \"activeProject\": { \"project_id\": 67973, \"project_title\": \"Blue Book\", \"task_id\": 42282, \"task_title\": \"Blue Book task\", \"duration\": 6027 } } } }, \"__request_time\": \"1678957028\" }"
``` data emulation above ```

Search across nested json to get employee details for different dates

fields

stats

table

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk