Hi Splunkers,
I have a JSON event which is spewed out of an API endpoint like so (note, I cannot manipulate the request to return anything else):
{
"name": "top-pages-realtime",
"query": {
"dimensions": "rt:pagePath,rt:pageTitle",
"metrics": [
"rt:activeUsers"
],
"sort": [
"-rt:activeUsers"
],
"max-results": 20
},
"meta": {
"name": "Top Pages (Live)",
"description": "The top 20 pages, measured by active onsite users, for all sites."
},
"data": [
{
"page": "site1.com",
"page_title": "site1",
"active_visitors": "1474"
},
{
"page": "site2.com",
"page_title": "site2",
"active_visitors": "1171"
}
],
"totals": {},
"taken_at": "2015-07-17T15:13:04.657Z"
}
One thing to note, timestamp extraction for events is not essential. I am happy to use IDX time time stamping.
As you'll probably have already noticed, using default JSON extractions, Splunk will create fields for data.page, data.page_title, data.active_visitors.
But this creates a problem...
An example of why: I want to char data.page by data.active_vistors. Using default extractions, Splunk lumps all values for data.page and active.visitors into fields, but because they are all in the same event it is impossible to associate the correct data.page value to the data.active_visitors value.
Leading into my question....
What is the best way to handle this event, to achieve my example above? Should I be trying to break the events before before they get indexed? Or can I manipulate the search to handle it?
Thanks!
Playing with it in my test Splunk... one option would be to keep the raw feeding in but instead use spath to get the data structures as is and then expand them as multivalued fields.
... | fields + _time _raw | eval data=spath(_raw,"data{}") | mvexpand data | rename data as _raw | spath | table _time *
"Spewed out of an API endpoint"... Could you describe more of how this data is being retrieved and fed to splunk? I wonder if while you might not be able to change the request, if you could manipulate the response before ingestion is an option.