Splunk Search

Combine the count of multiple fields for a common result

aaronnicoli
Path Finder

Hi all,

I am mainly asking this here as it's a little past my knowledge with Splunk.
Basically, I'm after a way of combining the results from two field extractions into one.

This is my scenario.
I have an index="my_index"
It contains log data entirely in the same format that dates back over 2 years, quite a lot of data around 1GB per day for the past 2 years.
Now the data is basically just from our "firewalls" can contains a few "important" fields.

The important stuff, per event.
Datestamp, Username, url_host.

I will explain these for you:
Datestamp is obvious.
Username is the user going through the firewall, this is captured via a custom field extraction.
url_host is again a custom extraction, it nabs just the domain that was hit from the URL string.

Now the difficult part, as you should (unless I am wrong) be aware, a field extraction needs to be linked to one of these: a source, a host or a source-type.

I have an issue with this, my index has multiple hosts, multiple source types (every day) and the source-type was modified several times throughout the life of the data due to problems we were experiencing with other fields (not related to this).

My problem is I want to run the following search:

index="my_index" username="someuser" | stats count by url_host

In other words, print me a list of all the sites (domains) this user connected to and the number of times they connected to each.

However, since the data has no common host, source or source-type I can only get results for a single host, or single source, or single source-type... which is pretty useless to me.

Is there anyway of overcoming this?
First thoughts would be...

Link a field extraction to an index?
Or create a search that combines the counts per url_host from two different url_host based extractions...?

Eg. final_url_host = count of url_host_01 + count of url_host_02???

Help me please my brain is hurting! lol

Aaron.

Tags (1)
0 Karma

aaronnicoli
Path Finder

Thanks for expanding.

My issue is not the number of bad sourcetypes (I have a total of 8 bad) it's an issue of this 7 of these are custom sourcetypes and fine... ck_csv_0X.

However, my wanted sourcetype is ck_csv_09 and I also have the default splunk sourcetype csv2 used for most of the data.

The issue is that if I rename my csv2 sourcetype to ck_csv_09 it will destroy (well rename) ALL of my data across multiple indexes where the csv2 sourcetype is used, not to mention if the csv2 sourcetype is even able to be renamed since it's default.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

If you have a manageably small number of sourcetypes, I would recommend the "rename" approach, and then set up the field extractions under the renamed. In order to rename, you'd need:

[badsourcetype1]
rename = goodsourcetype

[not-good-sourcetype2]
rename = goodsourcetype

[undesireable-sourcetype3]
rename = goodsourcetype

[goodsourcetype]
EXTRACT-1 = ...
REPORT-1 = ...

but that would mean having a stanza for every bad sourcetype. BUT, if you have a pattern to your necessary sourcetypes (or a small number of patterns), you can use:

[(?:::){0}badsourcetype*]
rename = goodsourcetype

This will rename all sourcetypes matching the pattern badsourcetype*. You can use other Splunk-PCRE-like patterns (the same wildcarding as in host and source stanza patterns, as long as you have (?:::){0} literally in the stanza name)

If you really have no common pattern (or you can't use a few common patterns to rename), then you will have to suffer the performance impact of doing the extraction over all the data. You can limit the impact by scoping your extraction to just one Splunk app though:

[host::*]
EXTRACT-1 = ...
REPORT-1 = ...

aaronnicoli
Path Finder

Hey Ashley,

Firstly many thanks for the reply!

I like the idea of renaming the sourcetype data, however, there is one issue I see, this renames the entire sourcetype...? (which in this case is mainly csv2, with some other custom ones I created)

Bit of a problem there as obviously I wouldn't want to rename the csv2 sourcetype.

The "rex" search item, definitely sounds like a goer!

I would really like to change the sourcetype for all the previous data but, yeah not sure how to go about this without a total re-import (which is very painful).

0 Karma

herbie
Path Finder

Hi Aaron,

I think the easiest way would be to rename all your previous sourcetypes to match your current one, then your field extraction and any other settings should apply to all your sourcetypes. Then when you search, everything will be shown as a single sourcetype.

To rename sourcetypes, add the following stanza to your props.conf file, replacing the 'old-sourcetype-name' and 'new-sourcetype-name' values:

[old-sourcetype-name]  
rename = new-sourcetype-name  

You'll need to add one of these for each sourcetype, you didn't mention how many there is so I'm not sure how hard this will be.

The other option would be to perform your field extraction within your search, so it would look something like (replace the "regex_statement" with the one from your field extraction):

index="my_index" username="someuser" | rex field=_raw "regex_statement" | stats count by url_host

Hope this helps.

Cheers,

Ashley

gkanapathy
Splunk Employee
Splunk Employee

You can use the rename setting in props.conf, but that would of course require having a stanza in props.conf for each sourcetype you want to rename...I suppose it's only marginally easier than just putting your field extraction under each stanza.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...