Splunk Search

categorization based on frequent text

subinj
New Member

Hi. I have an excel dump of incident tickets generated from the ticketing tool.
Sample incidents' description from the report:

  1. "Target: CI-xxxx Stateless event alarm Event details: HA recovered from a total cluster failure in cluster"
  2. "Server - CI-aaaa generates Multipath Issue Fibre Channel information: Multipathing ERROR, not all luns have 4 paths"
  3. "Servers generate CI-aaaa & CI-bbbbb - Multipath issue Fibre Channel information: Multipathing ERROR, not all luns have 4 paths"
  4. "Servers generate CI-aaaa & CI-bbbbb - Multipath issue Fibre Channel information: Multipathing ERROR, not all luns have 4 paths"
  5. "[VMware vCenter - Alarm Cluster high availability error] Insufficient resources to satisfy HA failover level on cluster"
  6. "F drive is having less disk space nagios-ebs: CI-xxxx "
  7. "Low disk space alert on CI-yyyyy"
  8. "Failed backup report for 2nd April 2012 : CI-xxxx , CI-aaaa , CI-bbbbb"
  9. "Failed backup report for 3rd April 2012 : CI-xxxx , CI-aaaa , CI-bbbbb"

There is no exclusive "category" field. My end objective is to perform a Trend Analysis to identify top recurring issues.
I could perform a grouping by going through the description fields one by one and identifying the incident type.

Desired output would be :

category ---- count of occurrence

HA ---- 2

Multipath ---- 3

disk space ---- 2

failed backup ---- 2

The manual grouping would not be feasible though for a list of 300+ incidents.

I was wondering if Splunk could identify the common significant text from the description fields and return a similar grouping, without the need to key in search strings ?

0 Karma

dmr195
Communicator

I know this question was asked quite a while ago, but in case anyone stumbles across this in a search I thought I'd mention that Prelert Anomaly Detective for Splunk (http://splunk-base.splunk.com/apps/68765/prelert-anomaly-detective) can categorize events based on looking for common words in the raw text.

0 Karma

Ayn
Legend

You mean if Splunk can somehow automatically identify a category for each of these messages and return it? In that case the answer is no. Splunk doesn't know anything about what these logs actually mean, it just indexes it just like any other data. Any other intelligence will have to be provided by you (or if someone else already provided the intelligence through an app or similar).

If you mean that Splunk could match on individual strings in each message and create fields from that, certainly. You could match on the string "disk space" and put that into a field, same goes for any other string you're interested in.

0 Karma

Ayn
Legend

The index is in a proprietary binary format that can't be read in any way like that, so no, the assumption is false.

0 Karma

subinj
New Member

I was referring to the index file that would get generated when I run Splunk on the file containing the incident description.

Based on the example provided, I am assuming the index file would have the following content :
5 aaaa
2 backup
4 bbbbb
3 channel
4 cluster
2 disk
3 fibre
3 luns
3 multipath
where the numbers specify the number of times the string appears in the content.

Was wondering if I could read this index file to obtain the strings and count, provided my assumption about the index file contents are correct.

Thanks !

0 Karma

Ayn
Legend

Please clarify what you mean - what index file are you referring to, and which various strings?

0 Karma

subinj
New Member

Thanks Ayn!

Yes, the first part is what i am looking for, as currently I do not know what are the possible incident categories and associated strings I should be searching for.

Would it be feasible to read the index file from wherein I could identify the various strings and associated number of occurrences?

0 Karma

Lamar
Splunk Employee
Splunk Employee

Can you provide the data. It's still, to me, a little unclear what you're trying to accomplish.

0 Karma

subinj
New Member

Right about the format - it doesn't have a common template. Thanks Lamar !

0 Karma

Lamar
Splunk Employee
Splunk Employee

The problem that you'll have with this data is the fact that it isn't relatively common in format.

You have some events that have their description after a ":" and then some descriptions actually start at the beginning of the event/line.

You could create a hash of your event and key off that with a lookup or something similar to that.

0 Karma

subinj
New Member

Thanks for your time Lamar !
I have edited my original post to include samples of my requirement. Trust this brings in more clarity.

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...