what is best way to modify data before ingesting t...

AKG1_old1 · ‎07-26-2017

Hi,

I am injesting some data to splunk and in my data there is no unique field to sperate different rows. So I am thinking if we can either add some unique code before every line or remove the rows which are not required before uploading it to splunk.

I am looking the best and quick way to modify logs before uploading to Splunk.

My actual problem is described in below post thats why I am thinking for modifying data before injestion.
Actual Problem

Sample Data:

533.366: [GC [1 CMS-initial-mark: 6057767K(12058624K)] 6153969K(12478080K), 0.0603480 secs] [Times: user=0.03 sys=0.03, real=0.06 secs] 
533.426: [CMS-concurrent-mark-start]
533.771: [GC533.771: [ParNew: 410115K->88369K(419456K), 0.0160606 secs] 6467882K->6146136K(12478080K), 0.0162425 secs] [Times: user=0.27 sys=0.01, real=0.02 secs]
533.844: [GC533.845: [ParNew: 402993K->87616K(419456K), 0.0476447 secs] 6460760K->6191512K(12478080K), 0.0478016 secs] [Times: user=0.35 sys=0.04, real=0.05 secs] 
534.224: [CMS-concurrent-mark: 0.682/0.798 secs] [Times: user=10.97 sys=0.15, real=0.80 secs] 
534.224: [CMS-concurrent-preclean-start]
534.301: [CMS-concurrent-preclean: 0.076/0.077 secs] [Times: user=0.23 sys=0.00, real=0.08 secs] 
534.301: [CMS-concurrent-abortable-preclean-start]
534.410: [GC534.410: [ParNew: 419456K->101719K(419456K), 0.0527827 secs] 6551488K->6283935K(12478080K), 0.0529380 secs] [Times: user=0.42 sys=0.04, real=0.05 secs] 
534.517: [GC534.517: [ParNew: 416343K->67094K(419456K), 0.0300013 secs] 6598559K->6277305K(12478080K), 0.0301352 secs] [Times: user=0.27 sys=0.02, real=0.03 secs]
534.639: [CMS-concurrent-abortable-preclean: 0.239/0.339 secs] [Times: user=5.68 sys=0.09, real=0.34 secs]
534.640: [GC[YG occupancy: 168960 K (419456 K)]534.640: [Rescan (parallel) , 0.0530936 secs]534.693: [weak refs processing, 1.1142144 secs]535.808: [scrub string table, 0.0012464 secs] [1 CMS-remark: 6210211K(12058624K)] 6379172K(12478080K), 1.1687789 secs] [Times: user=2.40 sys=0.01, real=1.17 secs]  
535.809: [CMS-concurrent-sweep-start]

Thanks
Ankit

DalJeanis · ‎07-26-2017

Not sure what you mean.

1) Every row starts with an obvious timestamp formatted like this ... ^\d{3}.\d{3} [ ... for example...

533.366: [

2) Splunk has the ability, when you set the conf files correctly, to ignore lines that match certain criteria. You send them to the Null queue. Unneeded rows seem to be ones without detail data, for instance ones containing this...

[CMS-concurrent-mark-start]

Setting the configuration to ignore them requires only that you carefully describe which records need to be ignored. You would just need to make a list of what you wanted to get rid of.

3) Alternately, you could set it to ignore everything that DID NOT match certain criteria. It looks like lines with useful data all end with this...

 [Times: user=[\d\.]+ sys=[\d\.]+, real=[\d\.]+ secs]

Setting the configuration to keep them requires only that you carefully describe which records need to be kept.

mattymo · ‎07-26-2017

Hi agoyal!

Can you tell us more about the nature of this data? What dimensions would you want to split on?

Host? Process? JobID?

When you say unique field, tell us more about what makes sense.

As for the "best" way, that will all depend on many factors including the skillset available to you and the time in which you have to refine your approach.

The "best" way is the way that works in the time you have 😉

- MattyMo

AKG1_old1 · ‎07-26-2017

@mmodestino_splunk: I have posted my main problem in other post (link given below).
Main Problem

Problem is that I am not be able to figure it out any way to display all data using data model/accelarated query.

In this post, I just want to know if is there any way in Splunk to incorporate some function to add say row number before each row. or remove all lines which have pattern say " [CMS-concurrent-mark-start]".

eq. one solution which came in my mind is to write a bash script which modify the text file before uploading to splunk.

kmorris_splunk · ‎07-26-2017

Can you provide us with a sample few events from your data?

AKG1_old1 · ‎07-26-2017

@kmorris: updated post with sample data.

what is best way to modify data before ingesting to Splunk

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio