Solved: How to separate a large number of sources into sou...

jason_hubbard · ‎08-08-2011

Scenerio

We are receiving over 700 sources forwarded from a Syslog-ng[remote source] service and they are being collected by Syslog-ng [local source] service running on our Splunk Indexer. The logs received from the remote source are separated by host (in this case host=IP address); with the destination directory on local source being determined by the $HOST variable.

Problem

I am having trouble with the inputs.conf and props.conf ... specifically in separating them into sourcetypes. The logs are coming from dozens of sourcetypes with the possibility of a subset of versions among each sourcetype.

I am using PCRE Regex Expressions to separate sourcetypes by host; since it seems easier to identify them by their IP since I have a list that tells me what they are.

Attempted Solution 1

Using inputs.conf to separate sourcetypes based on path, filtered by a whitelist regex:

Inputs.conf
[monitor:///var/log/syslog-ng/*/messages]
     host_segment=4
     sourcetype=type1
     queue=parsingQueue
     disabled=0
     followTail=1
     index=index1
     whitelist=(reg_ex_for_type1)

[monitor:///var/log/syslog-ng/*/messages]
     host_segment=4
     sourcetype=type2
     queue=parsingQueue
     disabled=0
     followTail=1
     index=index1
     whitelist=(reg_ex_for_type2)

[monitor:///var/log/syslog-ng/xxx.xxx.xxx.*/messages]
     host_segment=4
     sourcetype=type3
     queue=parsingQueue
     disabled=0
     followTail=1
     index=index1

In this version it puts everything (all sources from syslog) into sourcetype type3 and does not process the other other. I have verified that no other sourcetypes were created by running the following search:

index=index1| stats values(host) by sourcetype

I read on the forums somewhere that this cannot be done because the monitor path is technically the same, even though the regex should make the paths different.

Attempted Solution 2

Inputs.conf
[monitor:///var/log/syslog-ng/*/messages]
     host_segment=4
     followTail=1
     index=index1
     blacklist=(regex_exclude_certain_hosts)

Props.conf (version1)
[source::/var/log/syslog-ng/(my_specific_type1_regex)/messages]
     sourcetype=type1

[source::/var/log/syslog-ng/(my_specific_type2_regex)/messages]
     sourcetype=type2

[source::/var/log/syslog-ng/xxx.xxx.xxx.*/messages]
     sourcetype=type3

Props.conf (version2)
[source::.../(regex_for_type1)/*]
     sourcetype=type1

[source::.../(regex_for_type2)/*]
     sourcetype=type2

[source::.../xxx.xxx.xxx.*/*]
     sourcetype=type3

In this version the I'm using the Blacklist with inputs to filter certain logs; which is working fine, but the props.conf (both versions) attempts are not applying the sourcetypes; and are now being sourcetyped as syslog.

I have seen there is another option with props/transforms that involve looking at every event and determining the type of event by matching to a regex template of what the event is supposed to look like, however with over 700 sources and a range of versions within the subset of sourcetypes, it would be a daunting task to build a pattern for every given source.

jason_hubbard · ‎08-12-2011

I ended up not using Splunk to separate the source types. Instead I created a detailed Syslog-ng config file that handled all the inputs and dropped them into different sub directories. Then Splunk could just be pointed to those directories since they were different.

If anyone wants to know how I prepped Syslog-ng, here is an example config:

syslog-ng.conf

options {

        long_hostnames(off);

        keep_hostname(yes);

        use_dns(no);

        owner("root");

        group("root");

        perm(0640);

        dir_owner("root");

        dir_group("root");

        dir_perm(0750);

        create_dirs(yes);

};

source s_remote {

        tcp(ip(0.0.0.0) port(514));

        udp(ip(0.0.0.0) port(514));

};

filter f_myfilter_1 {

        host("^xxx.xxx.xxx.[0-9]{1,2}$");

};

destination d_myfilter_1 {

        file("/var/log/syslog-ng/filtered_source/$HOST/messages");

};

destination d_fallback {

        file("/var/log/syslog-ng/$HOST/messages");

};

log {

        source(s_remote); 

        filter(f_myfilter_1);

        destination(d_myfilter_1);

        flags(final);};

log {

        source(s_remote);

        destination(d_newsources);

        flags(fallback);

};

inputs.conf
[monitor:////var/log/syslog-ng/filtered_source/*/messages]
        host_segment=5
        sourcetype=filtered:data
        queue=parsingQueue
        disabled=0
        followTail=1
        index=myindex

View solution in original post

jason_hubbard · ‎08-12-2011

I ended up not using Splunk to separate the source types. Instead I created a detailed Syslog-ng config file that handled all the inputs and dropped them into different sub directories. Then Splunk could just be pointed to those directories since they were different.

If anyone wants to know how I prepped Syslog-ng, here is an example config:

syslog-ng.conf

options {

        long_hostnames(off);

        keep_hostname(yes);

        use_dns(no);

        owner("root");

        group("root");

        perm(0640);

        dir_owner("root");

        dir_group("root");

        dir_perm(0750);

        create_dirs(yes);

};

source s_remote {

        tcp(ip(0.0.0.0) port(514));

        udp(ip(0.0.0.0) port(514));

};

filter f_myfilter_1 {

        host("^xxx.xxx.xxx.[0-9]{1,2}$");

};

destination d_myfilter_1 {

        file("/var/log/syslog-ng/filtered_source/$HOST/messages");

};

destination d_fallback {

        file("/var/log/syslog-ng/$HOST/messages");

};

log {

        source(s_remote); 

        filter(f_myfilter_1);

        destination(d_myfilter_1);

        flags(final);};

log {

        source(s_remote);

        destination(d_newsources);

        flags(fallback);

};

inputs.conf
[monitor:////var/log/syslog-ng/filtered_source/*/messages]
        host_segment=5
        sourcetype=filtered:data
        queue=parsingQueue
        disabled=0
        followTail=1
        index=myindex

How to separate a large number of sources into sourcetypes with regex?

Scenerio

Problem

Attempted Solution 1

Attempted Solution 2

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!