We feed JSON data into our Splunk index. It is not a flat JSON, but has a couple of levels of nested-ness. For instance:
{
id: "some_id",
main_error_string: "something bad happened",
details: {
other_severe_error_strings: ["verybad1", "verybad2", "verybad3"],
other_mild_error_strings: ["bad1", "bad2", "bad3"]
}
}
What we now want to do is collect all the error messages from these different fields like
'main_error_string', 'details.other_severe_error_strings', 'details.other_mild_error_strings'
etc. into one huge list of strings.
Our ultimate goal is to then feed this list of strings into an unsupervised text classifier engine to see what all clusters come out of the various error strings.
So the question is - what Splunk query do I write to flatten those different JSON fields (of different types - string and list of strings) and in different hierarchies into one list of strings?
Like this:
| makeresults
| eval this_error_string_1="1 2 3 4", another_error_strings_etc="a b c d e f g"
| makemv this_error_string_1
| makemv another_error_strings_etc
| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"
| eval junk="ThisFieldValueDoesNotMatter"
| stats values(*error_string*) AS *error_string* BY junk
| untable junk key values
| makemv values
| stats values(values) As error_strings
Like this:
| makeresults
| eval this_error_string_1="1 2 3 4", another_error_strings_etc="a b c d e f g"
| makemv this_error_string_1
| makemv another_error_strings_etc
| rename COMMENT AS "Everything above generates sample event data; everything below is your solution"
| eval junk="ThisFieldValueDoesNotMatter"
| stats values(*error_string*) AS *error_string* BY junk
| untable junk key values
| makemv values
| stats values(values) As error_strings
Thank you! That was quite helpful.