Data Import Question

SteveWu · ‎01-03-2014

So I have a log file that has a unique format similar to the following

==============================================

==============Summary=========================
Total Memory: 8834798374
Cached: 39399
...

===============up time=========================
19:00:20 up 5 days, 8:53

=================memory========================
USER PID COMMAND MEM%
root 2919 /bash 9
root 2023 top 14

Based on what I've read in the documentation and the posts, it looks like I can either write a very sophisticated sourcetype or just write a separate pre-processing script to properly parse the data and output it into a friendlier format for the engine. My question is am I missing something or are these my only realistic options?

lguinn2 · ‎01-03-2014

You can pre-process the data with a script, sure. But Splunk takes care of things like restarts and doesn't duplicate data, etc. These things can be a PITA to do in a script. I don't think that your sourcetype needs to be that difficult. There are two main tasks: (1) index the data and (2) set up the fields, etc.

First, create a test index. Use Data Preview to bring in a sample of the data. You will be able to set the event boundaries (line-breaking) and timestamps with Data Preview. Data Preview will create the sourcetype settings in props.conf that you need to index the data. Put the props.conf stanza on the indexer(s). Create the stanza in inputs.conf to start reading/indexing the real data.

Second, create the fields that you need. This will involve regular expressions and may be a bit more tricky. On the other hand, you can change the field extractions in production, without having to re-index any data. You can use the Interactive Field Extractor to help, although it may not be able to deal with all of the fields.

Finally, show a couple of sample events on the forum and people will help you write the field extractions.

The biggest reason that people get into difficulties, is that they load their data into production before testing it for a day or two in a test index. If you play with your new data for a bit, and even write a few searches, I think you will have a much better idea of what you want - even if you decide to write that pre-processing script in the end.

Data Import Question

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms