Getting Data In

Filtering Syslog Inputs before Indexing (to optimize license usage)

yh
Explorer

Hello I am referring to the following documentation Route and filter data - Splunk Documentation

I would like to discard some syslog data coming from the firewall in my case for instance before it goes through indexing.

For instance in props under system I have this

[source::udp:514]
TRANSFORMS-null= setnull

[source::tcp:514]
TRANSFORMS-null= setnull

And for the transforms if I want to filter out traffic going to Google DNS

[setnull]
REGEX = dstip=8\.8\.8\.8
DEST_KEY = queue
FORMAT = nullQueue

I have tried renaming the transforms and duplicating set null with different names, however the event filtering only works on the UDP source but does not work on the TCP source.

Did I miss out anything as it feels really weird that the event discarding does not work on the TCP syslog source. Any ideas, or alternatives for discarding of events from an AIO Splunk Setup?

Thanks in advance

 

Labels (4)
Tags (2)
0 Karma

yh
Explorer

Hi @gcusello @deepakc 

Thanks for all the inputs.
Checking in again today after the weekend, the TCP filtering is working fine!

The TCP events from the firewall stopped 1-2 hours after disabling of TCP input, and I suspect this might be due to TCP backlogs?

I am not sure how Splunk handles TCP backlogs, but it seems that TCP backlogs will not be processed by the event filtering syntax. Maybe TCP backlogs are "past" the filtering stages and are slowly ingested?

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @yh 

if your Forwarder is overloaded (especially if you have many events and many transformations to apply) you risk to lose events, for this reason, it's better to use rsyslog, writing on disk the file to read.

Then, if you have a performant disk in the Heavy Forwarder  (at least 800 IOPS) you could apply parallel pipeline (https://docs.splunk.com/Documentation/Splunk/9.2.1/Indexer/Pipelinesets).

at least, you could add more CPUs to your HF.

Ciao.

Giuseppe

yh
Explorer

It is an AIO, search head + indexer, I think the disk performance is also lacking hence the issue.

0 Karma

yh
Explorer

Hello @deepakc @gcusello 

I have checked. Netstat shows the only process listening on 514 is Splunkd. Normally on Windows I use Kiwi Syslog.

Just a while I changed the Regex again to . so that all events are moved to null queue.

Injected syslog messages with python on TCP (SOCK_STREAM) and surprising no data appears for my test messages. But the firewall messages with "source = tcp:514" is still relentlessly appearing.

Can a source appear as TCP: 514 but is not the case? Used btools to check for TCP modifications but can't find any. Oh my god, it's so weird. Seems like ghost TCP 514 messages. Next I will block that port with firewall and see if it still comes in.

Update once more:

We removed all TCP inputs from data input and do a restart. Funnily the TCP:514 source still comes into the SIEM. I tried with my python syslog message generator and I got an error because the Splunk server rejects the connection. Feels like some kind of corruption somewhere where the messages thinks they are TCP:514?

0 Karma

deepakc
Builder

Maybe look at installing Wireshark if you can and check that way as well. 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @yh ,

the TRANSFORMS command in the transforms.conf is a regex on the raw data, it doesn't work on fields.

Are you sure that in the TCP raw data you have the string you configured?

cuold you share some samples of both the events that you want to filter (both UDP and TCP)?

Ciao.

Giuseppe

0 Karma

yh
Explorer

Hello Giuseppe,

Blanking out some of the details with XXXX for anonymity. I remember we did try with Regex = . for the TCP once too.

Sample event from TCP:
<189>logver=700140601 timestamp=1714717074 devname="XXXXX" devid="XXXXX" vd="root" date=2024-05-03 time=06:17:54 eventtime=1714688275070439553 tz="XXX" logid="0001000014" type="traffic" subtype="local" level="notice" srcip=XX.XX.XX.XX srcname="XXXXXX" srcport=31745 srcintf="port1" srcintfrole="undefined" dstip=XXX.XXX.XXX.XXX dstname="XXX" dstport=443 dstintf="root" dstintfrole="undefined" srccountry="XXXX" dstcountry="XXX" sessionid=68756048 proto=6 action="deny" policyid=1 policytype="local-in-policy" poluuid="7575f13c-5066-51ed-1e15-40b0e5867f81" service="HTTPS" trandisp="noop" app="HTTPS" duration=0 sentbyte=0 rcvdbyte=0 sentpkt=0 rcvdpkt=0 appcat="unscanned" crscore=5 craction=262144 crlevel="low"

Sample event from UDP:
May 3 14:21:57 10.XX.XX.XX logver=700140601 timestamp=1714746117 devname="XXX" devid="XXX" vd="XXX" date=2024-05-03 time=14:21:57 eventtime=1714717317787162683 tz="XXX" logid="0000000013" type="traffic" subtype="forward" level="notice" srcip=XX.XX.X.X srcport=38915 srcintf="port5" srcintfrole="lan" dstip=XX.XX.XX.XX dstport=443 dstintf="port9" dstintfrole="wan" srccountry="Reserved" dstcountry="XXXX" sessionid=759555888 proto=17 action="accept" policyid=1 policytype="policy" poluuid="7ade8e92-454b-51e9-5c91-4feddb630366" policyname="XXXXX" service="udp/443" trandisp="snat" transip=XX.XX.XX.XXX transport=38915 appid=40169 app="QUIC" appcat="Network.Service" apprisk="low" applist="XXX" duration=49 sentbyte=1228 rcvdbyte=0 sentpkt=1 rcvdpkt=0 utmaction="block" countapp=1

It seems a bit weird to me that the TCP input doesn't have the data and time infront but starts with <189>, wonder if that is normal.

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @yh ,

the <189> string isn't relevant.

It's relevant only the regex you are using.

Anyway, having these logs, you should be able to filter both the logs containing "dstip\=8\.8\.8\.8".

Try to add a backslash before "=" event if it shouldn't be the issue.

Are you sure that the not filtered events directly arrive to the HF and that there isn't another HF in the middle?

Ciao.

Giuseppe

0 Karma

yh
Explorer

Hi,

For my case, this is not a heavy forwarder. But this is an indexer and search head.

The UDP and TCP flow is direct to the AIO indexer / search head.

I understand from the documentation that a HF is more flexible where we can filter using sourcetype, hosts etc but for the indexer it is still possible, but the example given is on the source. We did try to add a backlash to the equal sign, and even just try discarding all events with a "." as the regex search key but somehow the TCP source can't perform any filtering. Is there something unique with direct filtering on indexers?

It shows clearly that "source = tcp:514" I don't suppose it has been renamed, or should I try renaming this port, not sure if that help.

On the inputs conf, I think there are some TCP settings that directs the sourcetype based on the host address, but I don't suppose that will have any impact?



0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @yh ,

I asked because filtering should be applied in the first full Stunk instance that the data pass through,

If you don't have an intermediate HF (or another Splunk full instance) and your data directly arrive to your stand alone Splunk server, the conf files are correctly located.

Ciao.

Giuseppe

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @yh ,

as @deepakc said, it's prefereable to use an rsyslog to take syslogs instead of Splunk, in this way you're sure to save logs even if your Splunk is down or overloaded.

Are you receiving many logs?

Ciao.

Giuseppe

0 Karma

yh
Explorer

Hi @gcusello @deepakc 

I will check if Rsyslog is installed. As far as I am aware it shouldn't and also it is running on Windows.
20+ GB used in the license per day.

Is it recommended to have Rsyslog / Kiwi Syslog running on the server and then to use file monitor to copy in events?

0 Karma

deepakc
Builder

So if its Splunk on Windows - then r-syslog shouldn't be there,  (I assumed you was running on Linux)  

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi  @yh ,

if the receiver is on Windows you cannot use neither rsyslog or SC4S, you should find another syslog receiver server.

Ciao.

Giuseppe

0 Karma

deepakc
Builder

Yes its better, to send data to files and us a UF, to pick the logs up that way, we tend to use Splunk ports for general testing, POC etc, but not really for production, also you can look at SC4S, this is the better for syslog network data collection in general and it supports most common syslog formats. 

SC4S  - From the link go to the documentation 

https://splunkbase.splunk.com/app/4740   

deepakc
Builder

It is better to use rsyslog, but in this case I think the issue may be related to the fact the rsyslog is running and using the TCP:514 port - that may the cause of the current issue, and longer term its better to use rsyslog or syslong-ng or even better SC4S but that another topic.

0 Karma

deepakc
Builder

I'm wondering if you have rsyslog running on the AIO - see if you can turn that off if its running 

Check to see if its running, if so stop it as it defaults to TCP 514
sudo systemctl status rsyslog


0 Karma

yh
Explorer

Thanks for the reply. We did do a netstat check and it's a TCP connection in between the source host and Splunk. Something similar that was tried, example

[source::udp:514]
TRANSFORMS-null1= udp_setnull

[source::tcp:514]
TRANSFORMS-null2= tcp_setnull

However weirdly the TCP part is not working. Once we even tried removing UDP and just have the TCP portion, but it still doesn't work. Very weird.

TRANSFORMS-null= setnull_tcp_traffic

0 Karma

deepakc
Builder

Check that are actually receiving traffic from TCP

sudo tcpdump -i <my_interface> tcp port <splunk_port>
sudo tcpdump -i <my_interface> udp port <splunk_port>


Try the below, and see that corrects it.

[source::udp:514]
TRANSFORMS-null= setnull_udp_traffic

[source::tcp:514]
TRANSFORMS-null= setnull_tcp_traffic


[setnull_udp_traffic]
REGEX = dstip=8\.8\.8\.8
DEST_KEY = queue
FORMAT = nullQueue


[setnull_tcp_traffic]
REGEX = dstip=8\.8\.8\.8
DEST_KEY = queue
FORMAT = nullQueue

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...