Hello
I'm trying to split a Json file from FaceBook Graph API into multiple Events in the props.conf
Here is the json simple:
{
"about": "http://www.appledaily.com.tw",
"posts": {
"data": [
{
"message": "first post message",
"created_time": "2016-11-01T11:20:01+0000",
"id": "232633627068_10155237456442069",
"likes": {
"data": [
{
"id": "125823837756509",
"name": "XXX"
},
{
"id": "125547431150532",
"name": "OOO"
}
],
"paging": {
"cursors": {
"before": "MTI1ODIzODM3NzU2NTA5",
"after": "Nzk0NDQzNDAzOTEyNjc3"
}
}
}
},
{
"message": "other messages",
"created_time": "2016-11-01T11:10:00+0000",
"id": "232633627068_10155237171047069",
"likes": {
"data": [
{
"id": "434788333331603",
"name": "AA"
},
{
"id": "1485443865001594",
"name": "BB"
}
],
"paging": {
"cursors": {
"before": "NDM0Nzg4MzMzMzMxNjAz",
"after": "NjA4NDc4NTY5MjU5ODQ1"
}
}
}
}
],
"paging": {
"previous": "https://graph.facebook.com/v2.8/232633627068/posts?limit=10&fields=likes.limit%2810000%29,message,cr...",
"next": "https://graph.facebook.com/v2.8/232633627068/posts?limit=10&fields=likes.limit%2810000%29,message,cr..."
}
},
"id": "232633627068"
}
This is my props.conf setting:
[_json]
INDEXED_EXTRACTIONS = json
KV_MODE = JSON
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ^{
TIMESTAMP_FIELDS = created_time
TIME_FORMAT = %FT%T%z
TRUNCATE = 100000000
pulldown_type = true
disabled = false
TZ = UTC
What should the props.conf look like to split such a file to become multiple Events?
or input the file then used spath to to split event ?
thank you for your suggestions.
Hello @blzaxe,
The best way would be to preprocess with a modular input or some kinda of script. If thats not an option you are going to need to use index time transforms withs some additional props. I am guessing the data you want to split in to multiple events is everything contained within :
{
"about": "http://www.appledaily.com.tw",
"posts": {
"data": [
I am also assuming its a single line event or is it pretty printed. I let you figure that out, but for this example I am going believe your event looks is a single line like this {"about": "http://www.appledaily.com.tw","posts": {"data": [
Step one create transforms to strip out the outer json body
[removeOuterBody1]
# regex captures outer envelop/message container
REGEX = ^({[^\n]+data\":\s\[)([^\n]+)
FORMAT = $2
DEST_KEY = _raw
[removeOuterBody1]
# regex captures begining envelop/message container
REGEX = ([^\n]+)(\}\}\])$
FORMAT = $1
DEST_KEY = _raw
[removeOuterBody2]
# regex captures end envelop/message container
REGEX = ([^\n]+)(\}\}\])$
FORMAT = $1
DEST_KEY = _raw
Now you need to apply these to your props.
[CustomSourcetype]
TRANSFORMS-cleanMsg = removeOuterBody1, removeOuterBody2
DATETIME_CONFIG = CURRENT
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ,\{"message":
TIMESTAMP_FIELDS = created_time
TIME_FORMAT = %FT%T%z
TRUNCATE = 100000000
pulldown_type = true
disabled = false
TZ = UTC
The unfortunate problem is that you will still end up with a comma in your broken events, but unfortunately each event still contains a comma which makes it invalid json. You could clean this up if you did all this pre-parsing an a HF and then used another transform to strip the comma at the begin of the event on the indexers.
Excuse me! I put transforms.conf in \etc\apps\app_names\local
why it can't do?