Splunk Search

How to separate a large number of sources into sourcetypes with regex?

jason_hubbard
Path Finder

Scenerio

We are receiving over 700 sources forwarded from a Syslog-ng[remote source] service and they are being collected by Syslog-ng [local source] service running on our Splunk Indexer. The logs received from the remote source are separated by host (in this case host=IP address); with the destination directory on local source being determined by the $HOST variable.

Problem

I am having trouble with the inputs.conf and props.conf ... specifically in separating them into sourcetypes. The logs are coming from dozens of sourcetypes with the possibility of a subset of versions among each sourcetype.

I am using PCRE Regex Expressions to separate sourcetypes by host; since it seems easier to identify them by their IP since I have a list that tells me what they are.

Attempted Solution 1

Using inputs.conf to separate sourcetypes based on path, filtered by a whitelist regex:

Inputs.conf
[monitor:///var/log/syslog-ng/*/messages]
     host_segment=4
     sourcetype=type1
     queue=parsingQueue
     disabled=0
     followTail=1
     index=index1
     whitelist=(reg_ex_for_type1)

[monitor:///var/log/syslog-ng/*/messages]
     host_segment=4
     sourcetype=type2
     queue=parsingQueue
     disabled=0
     followTail=1
     index=index1
     whitelist=(reg_ex_for_type2)

[monitor:///var/log/syslog-ng/xxx.xxx.xxx.*/messages]
     host_segment=4
     sourcetype=type3
     queue=parsingQueue
     disabled=0
     followTail=1
     index=index1

In this version it puts everything (all sources from syslog) into sourcetype type3 and does not process the other other. I have verified that no other sourcetypes were created by running the following search:

index=index1| stats values(host) by sourcetype
I read on the forums somewhere that this cannot be done because the monitor path is technically the same, even though the regex should make the paths different.

Attempted Solution 2

Inputs.conf
[monitor:///var/log/syslog-ng/*/messages]
     host_segment=4
     followTail=1
     index=index1
     blacklist=(regex_exclude_certain_hosts)
Props.conf (version1)
[source::/var/log/syslog-ng/(my_specific_type1_regex)/messages]
     sourcetype=type1

[source::/var/log/syslog-ng/(my_specific_type2_regex)/messages]
     sourcetype=type2

[source::/var/log/syslog-ng/xxx.xxx.xxx.*/messages]
     sourcetype=type3
Props.conf (version2)
[source::.../(regex_for_type1)/*]
     sourcetype=type1

[source::.../(regex_for_type2)/*]
     sourcetype=type2

[source::.../xxx.xxx.xxx.*/*]
     sourcetype=type3

In this version the I'm using the Blacklist with inputs to filter certain logs; which is working fine, but the props.conf (both versions) attempts are not applying the sourcetypes; and are now being sourcetyped as syslog.

I have seen there is another option with props/transforms that involve looking at every event and determining the type of event by matching to a regex template of what the event is supposed to look like, however with over 700 sources and a range of versions within the subset of sourcetypes, it would be a daunting task to build a pattern for every given source.

0 Karma
1 Solution

jason_hubbard
Path Finder

I ended up not using Splunk to separate the source types. Instead I created a detailed Syslog-ng config file that handled all the inputs and dropped them into different sub directories. Then Splunk could just be pointed to those directories since they were different.

If anyone wants to know how I prepped Syslog-ng, here is an example config:

syslog-ng.conf
options {
long_hostnames(off);
keep_hostname(yes);
use_dns(no);
owner("root");
group("root");
perm(0640);
dir_owner("root");
dir_group("root");
dir_perm(0750);
create_dirs(yes);
};
source s_remote {
tcp(ip(0.0.0.0) port(514));
udp(ip(0.0.0.0) port(514));
};
filter f_myfilter_1 {
host("^xxx.xxx.xxx.[0-9]{1,2}$");
};
destination d_myfilter_1 {
file("/var/log/syslog-ng/filtered_source/$HOST/messages");
};
destination d_fallback {
file("/var/log/syslog-ng/$HOST/messages");
};
log {
source(s_remote);
filter(f_myfilter_1);
destination(d_myfilter_1);
flags(final);};
log {
source(s_remote);
destination(d_newsources);
flags(fallback);
};

inputs.conf
[monitor:////var/log/syslog-ng/filtered_source/*/messages]
        host_segment=5
        sourcetype=filtered:data
        queue=parsingQueue
        disabled=0
        followTail=1
        index=myindex

View solution in original post

jason_hubbard
Path Finder

I ended up not using Splunk to separate the source types. Instead I created a detailed Syslog-ng config file that handled all the inputs and dropped them into different sub directories. Then Splunk could just be pointed to those directories since they were different.

If anyone wants to know how I prepped Syslog-ng, here is an example config:

syslog-ng.conf
options {
long_hostnames(off);
keep_hostname(yes);
use_dns(no);
owner("root");
group("root");
perm(0640);
dir_owner("root");
dir_group("root");
dir_perm(0750);
create_dirs(yes);
};
source s_remote {
tcp(ip(0.0.0.0) port(514));
udp(ip(0.0.0.0) port(514));
};
filter f_myfilter_1 {
host("^xxx.xxx.xxx.[0-9]{1,2}$");
};
destination d_myfilter_1 {
file("/var/log/syslog-ng/filtered_source/$HOST/messages");
};
destination d_fallback {
file("/var/log/syslog-ng/$HOST/messages");
};
log {
source(s_remote);
filter(f_myfilter_1);
destination(d_myfilter_1);
flags(final);};
log {
source(s_remote);
destination(d_newsources);
flags(fallback);
};

inputs.conf
[monitor:////var/log/syslog-ng/filtered_source/*/messages]
        host_segment=5
        sourcetype=filtered:data
        queue=parsingQueue
        disabled=0
        followTail=1
        index=myindex
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...