Getting Data In

LINE_BREAKER trouble

renems
Communicator

I'm struggeling to get splunk to break some json events properly. This is due to the fact, that my input has no new lines. Let me show you my input:

{"id":"40CC75B0DA1A8AEE3A5A884D7007D0D9","id_old":null,"favorited":null,"authorinfo":{"rank":1.3,"followercount":1475},"sentiment":"neu","link":"https:\/\/twitter.com\/innijverdal\/status\/735051205836148736","fulltext":"Zojuist is er een tas [gestolen] bij de Primera van een vrouw. De dader heeft vervolgens gepind bij de [Rabobank]. De... https:\/\/t.co\/U1ULiaE2yt","timestamp_link":"1464084838","timestamp_show":"1464084838","subsite":null,"author":"innijverdal","postid":"4a85:2c69:f51b:42cb","dataproviderid":null,"authortype":"user","label":"post","snippet":"Zojuist is er een tas [gestolen] bij de Primera van een vrouw. De dader heeft vervolgens gepind bij de [Rabobank]. De... https:\/\/t.co\/U1ULiaE2yt","numposts":"","pagerank":1,"title":"","sourcetype":"twitter","followercount":1475,"authorid":"1371834230","authorrealname":"Leven in Nijverdal","likescount":0,"authorrank":1.3,"fbid":null,"ytid":null,"replytoid":null,"avatar":"https:\/\/pbs.twimg.com\/profile_images\/435686221109395456\/FCz3PoOo_normal.png","coordinates":null,"media":[],"links":["http:\/\/www.leveninnijverdal.nl\/nieuws\/27205\/tas-[gestolen]-bij-primera-en-dader-pint-bij-[rabobank]"],"message_id":"735051205836148736","found_conversation":false,"postid_orig":"735051205836148736","mentioned":[],"translated_sourcetype":"twitter"},{"id":"579771EFE5829B94F17B3F03E7AB1177","id_old":null,"favorited":null,"authorinfo":{"rank":10.8,"followercount":830},"sentiment":"pos","link":"https:\/\/twitter.com\/Paul_0110\/status\/735036812033396736","fulltext":"Potverdomme [@Rabobank], het programma voor [internetbankieren] hebben jullie toch wel retestrak en klantvriendelijk voor mekaar!","timestamp_link":"1464081406","timestamp_show":"1464081406","subsite":null,"author":"Paul_0110","postid":"97f9:1675:4c33:5489","dataproviderid":null,"authortype":"user","label":"post","snippet":"Potverdomme [@Rabobank], het programma voor [internetbankieren] hebben jullie toch wel retestrak en klantvriendelijk voor mekaar!","numposts":"","pagerank":1,"title":"","sourcetype":"twitter","followercount":830,"authorid":"507333420","authorrealname":"Paul Netten \u00a9","likescount":0,"authorrank":10.8,"fbid":null,"ytid":null,"replytoid":null,"avatar":"https:\/\/pbs.twimg.com\/profile_images\/678477714735685632\/_SmvdMWf_normal.jpg","coordinates":null,"media":[],"links":[],"message_id":"735036812033396736","found_conversation":false,"postid_orig":"735036812033396736","mentioned":[{"authortype":"user","authorid":7385462,"authorrealname":"Rabobank","author":"Rabobank"}],"translated_sourcetype":"twitter"}

I'd like a line break at every },{"id"
The old line should end with },
The new line should start with {"id"

Any help would greatly appreciated.

Tags (1)
0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust

I'd use something like this maybe...

[sourceTypeName]
INDEXED_EXTRACTIONS=json
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE = ',{"id":'
SEDCMD-RemoveComma = 's/^\,//g'

Not sure if the sedcmd will be needed or if anything beyond indexed_extractions is needed at all.

View solution in original post

0 Karma

jkat54
SplunkTrust
SplunkTrust

I'd use something like this maybe...

[sourceTypeName]
INDEXED_EXTRACTIONS=json
SHOULD_LINEMERGE=true
BREAK_ONLY_BEFORE = ',{"id":'
SEDCMD-RemoveComma = 's/^\,//g'

Not sure if the sedcmd will be needed or if anything beyond indexed_extractions is needed at all.

0 Karma

ryanoconnor
Builder

I downvoted this post because i would stray away from using the break_only_before command due to performance. you'll actually get better performance using should_linemerge=false and then a linebreaker.

see a similar question asked here:

https://answers.splunk.com/answers/227121/what-is-the-difference-between-line-breaker-and-br.html

0 Karma

jkat54
SplunkTrust
SplunkTrust

Downvotes are for when something is going to damage someones system... something like "hey try running sudo rm -Rf /" or "format c:". See this before downvoting please: https://answers.splunk.com/answers/244111/proper-etiquette-and-timing-for-voting-here-on-ans.html

0 Karma

ryanoconnor
Builder

Apologies, the only reason I downvoted it is because we want to get people in the habit of not using SHOULD_LINEMERGE=true where possible. You'll see very significant performance improvements if you set SHOULD_LINEMERGE to false and use a regex for your LINE_BREAKER.

When you don't use that setting you're essentially skipping a step in the data pipeline (http://wiki.splunk.com/Community:HowIndexingWorks) and according to the Consultant II class, you'll see very significant performance improvements.

0 Karma

jkat54
SplunkTrust
SplunkTrust

If you remove code lines 3,4,5 from my answer and replace them with lines 2,& 3 from Ryan's answer, I think you'll be in a sweet spot for performance and still achieve what you want.

Indexed extractions could be of concern too because it uses more disk on indexers. Kv mode JSON on the search heads causes the JSON parsing at search time though and is less performant in many cases at search time. However indexed extractions is less performant at index time... It's a trade off and most people want to guarantee indexing over search which means Ryan's answer is better for most.

0 Karma

ryanoconnor
Builder

I didn't get around to ensuring timestamps were correct which you may want to look into for this data, however the following props.conf should help you out.

[your_sourcetype_name]
LINE_BREAKER = .*}(,){.*
SHOULD_LINEMERGE = False
KV_MODE = json
0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...