Solved: How to store lots of metadata that would be the sa...

PeteRichardson · ‎09-07-2014

We measure 50 values every 5 seconds during each hour long experiment. We do these experiments many times under different conditions (different ambient temp, SW build, HW type, etc. A dozen or so different metadata parameters). Each run saves measurements in a csv file I want to import into splunk but I'd like some advice on where to put the metadata about the experiment. The SW build number, for example, applies to the whole experiment (it doesn't change per row) and I want to be able to search on it.

Initially I thought I would just add a new column in the csv for each metadata value. That works fine, but the values are all the same in the column, so that seems wasteful. Then I thought of encoding the metadata into a simple string and using that for the source value, but then At some point I have to parse. Then I thought about a separate lookup table with some foreign key in the csv. That seems too database-y. (Not that there's anything wrong with that. Some of my best friends are DBAs)

If my goal is filter on the metadata values and analyze the measurements for just those experiments (e.g. "For all experiments run at 30C with SW build 1234, plot values for measurement x"), where/how should I store the metadata?
Thanks for any advice

lguinn2 · ‎09-07-2014

Keep it simple. Your initial idea is good. The values (or abbreviations) for the metadata in CSV files is clean and efficient.

Splunk compresses the raw data, so some space will be saved. The fact that some variables exhibit little variety in their values might even mean a smaller than average index size.

In the end, if you make it complicated, you will spend lots of your valuable time getting it set up. And you will probably have to ask Splunk to do more complicated searches, which will cost more in CPU and disk I/O. And what did you save? A few gigabytes of disk? I think it is very likely that you would not "save" when you take everything into account!

Here is Splunk's advice (not all of which seems applicable in this particular case): Logging Best Practices

View solution in original post

lguinn2 · ‎09-07-2014

Keep it simple. Your initial idea is good. The values (or abbreviations) for the metadata in CSV files is clean and efficient.

Splunk compresses the raw data, so some space will be saved. The fact that some variables exhibit little variety in their values might even mean a smaller than average index size.

In the end, if you make it complicated, you will spend lots of your valuable time getting it set up. And you will probably have to ask Splunk to do more complicated searches, which will cost more in CPU and disk I/O. And what did you save? A few gigabytes of disk? I think it is very likely that you would not "save" when you take everything into account!

Here is Splunk's advice (not all of which seems applicable in this particular case): Logging Best Practices

How to store lots of metadata that would be the same for each event?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes