Solved: Best practice for using the Common Information Mod...

rturk · ‎07-06-2011

Greetings Splunkers!

So I'm writing my first app for a Content Delivery Network platform and for ease of use I'm adopting the Common Information Model (as per HERE) in selecting the field names (as the vendor in question hasn't). I'm just wondering about how best to apply it?

As I see it, my options:

Keep the vendor supplied field names and write CIM compliant field aliases where appropriate, or;
Use my props.conf and transforms.conf and rewrite them on indexing, completely discarding the vendor supplied field names.
Log a "feature request" with the vendor from San Fran*Cisco* to get their logfiles in line 😉

Option 1 will allow users who are familiar with the log formats to pick up the application quickly, and has the advantage of aligning with vendor published documentation.

Option 2 has the advantages of a clean interface & normalisation, but at the cost of losing alignment with published vendor documentation, and a learning curve for existing users. I'd also include a README or something like that indicating field mappings.

Option 3 has about as much chance as a snowball fight in hell of succeeding.

What has been the experience of application developers here? Which path have you gone down and why?

Mucho Gracias!

RT

David · ‎07-12-2011

I have gone with Option 1, myself. The biggest problem with option 2 is that when users are looking at the raw event data (e.g., if they will ever have any sort of raw search capability), they will be totally lost when they try to do any field manipulations.

As it came about in my app, I have ridiculously unusable field names (very long to type). When I wrote my summary indexing, I shortened the names significantly (numberOfUnresolvedCountersInLastHour -> NumUnresolvedCounters). I then added aliases to for the raw data, so that users could search either by the field names they put in there, or by the names in my summary indexing.

A valid critique of this is that maintaining two sets of field names can confuse users no matter what -- if they build search terms on the long field names from the raw data, then try to apply that to the summary indexed data, they'll have problems lest I alias my summary data in the other direction, as well. In my particular scenario, using raw data is nigh suicidal for anyone not fully aware of the intricacies of Splunk, so it's a pretty minimal risk. Obviously, the only "real" solution is to alter the _raw field (impossible) or alter the source data.

My two cents.

View solution in original post

David · ‎07-12-2011

I have gone with Option 1, myself. The biggest problem with option 2 is that when users are looking at the raw event data (e.g., if they will ever have any sort of raw search capability), they will be totally lost when they try to do any field manipulations.

As it came about in my app, I have ridiculously unusable field names (very long to type). When I wrote my summary indexing, I shortened the names significantly (numberOfUnresolvedCountersInLastHour -> NumUnresolvedCounters). I then added aliases to for the raw data, so that users could search either by the field names they put in there, or by the names in my summary indexing.

A valid critique of this is that maintaining two sets of field names can confuse users no matter what -- if they build search terms on the long field names from the raw data, then try to apply that to the summary indexed data, they'll have problems lest I alias my summary data in the other direction, as well. In my particular scenario, using raw data is nigh suicidal for anyone not fully aware of the intricacies of Splunk, so it's a pretty minimal risk. Obviously, the only "real" solution is to alter the _raw field (impossible) or alter the source data.

My two cents.

Best practice for using the Common Information Model?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes