Solved: LINE_BREAKER trouble

renems · ‎05-26-2016

I'm struggeling to get splunk to break some json events properly. This is due to the fact, that my input has no new lines. Let me show you my input:

{"id":"40CC75B0DA1A8AEE3A5A884D7007D0D9","id_old":null,"favorited":null,"authorinfo":{"rank":1.3,"followercount":1475},"sentiment":"neu","link":"https:\/\/twitter.com\/innijverdal\/status\/735051205836148736","fulltext":"Zojuist is er een tas [gestolen] bij de Primera van een vrouw. De dader heeft vervolgens gepind bij de [Rabobank]. De... https:\/\/t.co\/U1ULiaE2yt","timestamp_link":"1464084838","timestamp_show":"1464084838","subsite":null,"author":"innijverdal","postid":"4a85:2c69:f51b:42cb","dataproviderid":null,"authortype":"user","label":"post","snippet":"Zojuist is er een tas [gestolen] bij de Primera van een vrouw. De dader heeft vervolgens gepind bij de [Rabobank]. De... https:\/\/t.co\/U1ULiaE2yt","numposts":"","pagerank":1,"title":"","sourcetype":"twitter","followercount":1475,"authorid":"1371834230","authorrealname":"Leven in Nijverdal","likescount":0,"authorrank":1.3,"fbid":null,"ytid":null,"replytoid":null,"avatar":"https:\/\/pbs.twimg.com\/profile_images\/435686221109395456\/FCz3PoOo_normal.png","coordinates":null,"media":[],"links":["http:\/\/www.leveninnijverdal.nl\/nieuws\/27205\/tas-[gestolen]-bij-primera-en-dader-pint-bij-[rabobank]"],"message_id":"735051205836148736","found_conversation":false,"postid_orig":"735051205836148736","mentioned":[],"translated_sourcetype":"twitter"},{"id":"579771EFE5829B94F17B3F03E7AB1177","id_old":null,"favorited":null,"authorinfo":{"rank":10.8,"followercount":830},"sentiment":"pos","link":"https:\/\/twitter.com\/Paul_0110\/status\/735036812033396736","fulltext":"Potverdomme [@Rabobank], het programma voor [internetbankieren] hebben jullie toch wel retestrak en klantvriendelijk voor mekaar!","timestamp_link":"1464081406","timestamp_show":"1464081406","subsite":null,"author":"Paul_0110","postid":"97f9:1675:4c33:5489","dataproviderid":null,"authortype":"user","label":"post","snippet":"Potverdomme [@Rabobank], het programma voor [internetbankieren] hebben jullie toch wel retestrak en klantvriendelijk voor mekaar!","numposts":"","pagerank":1,"title":"","sourcetype":"twitter","followercount":830,"authorid":"507333420","authorrealname":"Paul Netten \u00a9","likescount":0,"authorrank":10.8,"fbid":null,"ytid":null,"replytoid":null,"avatar":"https:\/\/pbs.twimg.com\/profile_images\/678477714735685632\/_SmvdMWf_normal.jpg","coordinates":null,"media":[],"links":[],"message_id":"735036812033396736","found_conversation":false,"postid_orig":"735036812033396736","mentioned":[{"authortype":"user","authorid":7385462,"authorrealname":"Rabobank","author":"Rabobank"}],"translated_sourcetype":"twitter"}

I'd like a line break at every },{"id"
The old line should end with },
The new line should start with {"id"

Any help would greatly appreciated.

jkat54 · ‎05-26-2016

I'd use something like this maybe...

[sourceTypeName]
INDEXED_EXTRACTIONS=json
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE = ',{"id":'
SEDCMD-RemoveComma = 's/^\,//g'

Not sure if the sedcmd will be needed or if anything beyond indexed_extractions is needed at all.

View solution in original post

jkat54 · ‎05-26-2016

I'd use something like this maybe...

[sourceTypeName]
INDEXED_EXTRACTIONS=json
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE = ',{"id":'
SEDCMD-RemoveComma = 's/^\,//g'

Not sure if the sedcmd will be needed or if anything beyond indexed_extractions is needed at all.

ryanoconnor · ‎05-26-2016

I downvoted this post because i would stray away from using the break_only_before command due to performance. you'll actually get better performance using should_linemerge=false and then a linebreaker.

see a similar question asked here:

https://answers.splunk.com/answers/227121/what-is-the-difference-between-line-breaker-and-br.html

jkat54 · ‎05-26-2016

Downvotes are for when something is going to damage someones system... something like "hey try running sudo rm -Rf /" or "format c:". See this before downvoting please: https://answers.splunk.com/answers/244111/proper-etiquette-and-timing-for-voting-here-on-ans.html

ryanoconnor · ‎05-26-2016

Apologies, the only reason I downvoted it is because we want to get people in the habit of not using SHOULD_LINEMERGE=true where possible. You'll see very significant performance improvements if you set SHOULD_LINEMERGE to false and use a regex for your LINE_BREAKER.

When you don't use that setting you're essentially skipping a step in the data pipeline (http://wiki.splunk.com/Community:HowIndexingWorks) and according to the Consultant II class, you'll see very significant performance improvements.

jkat54 · ‎02-21-2017

If you remove code lines 3,4,5 from my answer and replace them with lines 2,& 3 from Ryan's answer, I think you'll be in a sweet spot for performance and still achieve what you want.

Indexed extractions could be of concern too because it uses more disk on indexers. Kv mode JSON on the search heads causes the JSON parsing at search time though and is less performant in many cases at search time. However indexed extractions is less performant at index time... It's a trade off and most people want to guarantee indexing over search which means Ryan's answer is better for most.

ryanoconnor · ‎05-26-2016

I didn't get around to ensuring timestamps were correct which you may want to look into for this data, however the following props.conf should help you out.

[your_sourcetype_name]
LINE_BREAKER = .*}(,){.*
SHOULD_LINEMERGE = False
KV_MODE = json

LINE_BREAKER trouble

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes