Deployment Architecture

How to split this json become mutiple events?

blzaxe
New Member

Hello

I'm trying to split a Json file from FaceBook Graph API into multiple Events in the props.conf

Here is the json simple:

{
"about": "http://www.appledaily.com.tw",
"posts": {
"data": [
{
"message": "first post message",
"created_time": "2016-11-01T11:20:01+0000",
"id": "232633627068_10155237456442069",
"likes": {
"data": [
{
"id": "125823837756509",
"name": "XXX"
},
{
"id": "125547431150532",
"name": "OOO"
}
],
"paging": {
"cursors": {
"before": "MTI1ODIzODM3NzU2NTA5",
"after": "Nzk0NDQzNDAzOTEyNjc3"
}
}
}
},
{
"message": "other messages",
"created_time": "2016-11-01T11:10:00+0000",
"id": "232633627068_10155237171047069",
"likes": {
"data": [
{
"id": "434788333331603",
"name": "AA"
},
{
"id": "1485443865001594",
"name": "BB"
}
],
"paging": {
"cursors": {
"before": "NDM0Nzg4MzMzMzMxNjAz",
"after": "NjA4NDc4NTY5MjU5ODQ1"
}
}
}
}
],
"paging": {
"previous": "https://graph.facebook.com/v2.8/232633627068/posts?limit=10&fields=likes.limit%2810000%29,message,cr...",
"next": "https://graph.facebook.com/v2.8/232633627068/posts?limit=10&fields=likes.limit%2810000%29,message,cr..."
}
},
"id": "232633627068"
}

This is my props.conf setting:

[_json]
INDEXED_EXTRACTIONS = json
KV_MODE = JSON
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ^{
TIMESTAMP_FIELDS = created_time
TIME_FORMAT = %FT%T%z
TRUNCATE = 100000000
pulldown_type = true
disabled = false
TZ = UTC

What should the props.conf look like to split such a file to become multiple Events?
or input the file then used spath to to split event ?
thank you for your suggestions.

Tags (1)
0 Karma

bmacias84
Champion

Hello @blzaxe,

The best way would be to preprocess with a modular input or some kinda of script. If thats not an option you are going to need to use index time transforms withs some additional props. I am guessing the data you want to split in to multiple events is everything contained within :

{
"about": "http://www.appledaily.com.tw",
"posts": {
"data": [

I am also assuming its a single line event or is it pretty printed. I let you figure that out, but for this example I am going believe your event looks is a single line like this {"about": "http://www.appledaily.com.tw","posts": {"data": [

Step one create transforms to strip out the outer json body

[removeOuterBody1]
# regex captures outer envelop/message container
REGEX = ^({[^\n]+data\":\s\[)([^\n]+)
FORMAT = $2
DEST_KEY = _raw

[removeOuterBody1]
# regex captures begining envelop/message container
REGEX = ([^\n]+)(\}\}\])$
FORMAT = $1
DEST_KEY = _raw

[removeOuterBody2]
# regex captures end envelop/message container
REGEX = ([^\n]+)(\}\}\])$
FORMAT = $1
DEST_KEY = _raw

Now you need to apply these to your props.

[CustomSourcetype]
TRANSFORMS-cleanMsg = removeOuterBody1, removeOuterBody2
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE =  ,\{"message":
TIMESTAMP_FIELDS = created_time
TIME_FORMAT = %FT%T%z
TRUNCATE = 100000000
pulldown_type = true 
disabled = false
TZ = UTC

The unfortunate problem is that you will still end up with a comma in your broken events, but unfortunately each event still contains a comma which makes it invalid json. You could clean this up if you did all this pre-parsing an a HF and then used another transform to strip the comma at the begin of the event on the indexers.

0 Karma

blzaxe
New Member

Excuse me! I put transforms.conf in \etc\apps\app_names\local
why it can't do?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...