Getting Data In

Avoid overriding the host for a TCP input

bruceclarke
Contributor

Hi all,

I've discovered that, by default, Splunk wants to override any tcp input's host to use the IP of the remote server. However, I have a tcp input that I don't want to reset the host field for. I've been looking around at how to do this, but I can't find anything that works.

Here's a description of the problem:

  1. I have tcp logs coming from a device with the IP 192.168.1.239. Splunk appears to be receiving these logs as expected. An example log is this: 2019-08-05 19:44:06 6 192.168.1.245 - - 192.168.1.239 192.168.1.239 Unavailable - policy_denied DENIED "Technology/Internet" 403 TCP_DENIED GET - https settings-win.data.microsoft.com 443 - "MSDW" 192.168.1.239 1534 355 - none - - high settings-win.data.microsoft.com "Technology/Internet" unavailable unavailable
  2. The log above (while it is coming from the device with IP 192.168.1.239) is given the host value in Splunk of 192.168.1.245. This is NOT what I want. I want the host to stay as 192.168.1.239 (the device it is coming from).

I have looked at other Splunk Answers questions, and tried the following changes to my indexers' inputs.conf files:

  1. Having this stanza, which Splunk suggests will rewrite the host value [tcp://514] index = proxy sourcetype = syslog connection_host = ip
  2. Having this stanza, which I think should fix the issue, but I'm still seeing the host get overridden: [tcp://514] index = proxy sourcetype = syslog connection_host = none host = $decideOnStartup

Neither of these options work. I thought the second option would fix the issue, but I'm still seeing the logs get overridden with a host value of 192.168.1.245 instead of the IP of the device sending the log (192.168.1.239).

Can someone help me change this setting to use the original device's IP?

1 Solution

acharlieh
Influencer

The point where you're getting bitten isn't at the inputs.conf side, but rather deeper into the ingestion pipeline. For full details of all the steps of ingesting data into Splunk and most of the applicable configuration options along the way check out: https://wiki.splunk.com/Community:HowIndexingWorks

You're using the syslog sourcetype, which out of the box ( $SPLUNK_HOME/etc/default/props.conf) has these settings:

[syslog]
pulldown_type = true
maxDist = 3                        
TIME_FORMAT = %b %d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 32 
TRANSFORMS = syslog-host          
REPORT-syslog = syslog-extractions
SHOULD_LINEMERGE = False
category = Operating System                              
description = Output produced by many syslog daemons, as described in RFC3164 by the IETF

The line that's of particular importance here is TRANSFORMS. These are index time manipulations of data done against any data labeled with this sourcetype. The syslog-host transform being referenced as also out of the box from ( $SPLUNK_HOME/etc/system/default/transforms.conf) is defined as such:

[syslog-host]
DEST_KEY = MetaData:Host
REGEX = :\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)*\[?(\w[\w\.\-]{2,})\]?\s
FORMAT = host::$1 

(I'll note I pulled these from a 7.3.0 box, but this sourcetype has been in Splunk for quite some time)

After event breaking, this regex is attempted against the event, and if it matches the value of the first capturing group is assigned to the host value. You can probably guess what value is being extracted here: https://regex101.com/r/bdcLOx/1

So options on how to solve it...
1) Have the device actually emit syslog formatted packets... should be something like <PRI>Timestamp Hostname
2) Send the device to another process / rsyslog server which can be configured to reformat the data accordingly and send it on (or write to disk and have a Splunk Forwarder pick it up
3) (my favorite), Change inputs.conf to use a different sourcetype name... ideally your own following the standard vendor:product:technology:format method as described https://docs.splunk.com/Documentation/AddOns/released/Overview/Sourcetypes#Source_type_naming_conven.... Now Splunk's default parsing tends to get things mostly right... but to make Splunk have less guessing on timestamp formats, and event breaking, and make it more more deterministic, You could build your own props/conf for this sourcetype (or look on Splunkbase for a TA for the source system). By avoiding the syslog sourcetype (and thus the syslog-host TRANSFORM) no more rewriting of your host field at index time.

View solution in original post

acharlieh
Influencer

The point where you're getting bitten isn't at the inputs.conf side, but rather deeper into the ingestion pipeline. For full details of all the steps of ingesting data into Splunk and most of the applicable configuration options along the way check out: https://wiki.splunk.com/Community:HowIndexingWorks

You're using the syslog sourcetype, which out of the box ( $SPLUNK_HOME/etc/default/props.conf) has these settings:

[syslog]
pulldown_type = true
maxDist = 3                        
TIME_FORMAT = %b %d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 32 
TRANSFORMS = syslog-host          
REPORT-syslog = syslog-extractions
SHOULD_LINEMERGE = False
category = Operating System                              
description = Output produced by many syslog daemons, as described in RFC3164 by the IETF

The line that's of particular importance here is TRANSFORMS. These are index time manipulations of data done against any data labeled with this sourcetype. The syslog-host transform being referenced as also out of the box from ( $SPLUNK_HOME/etc/system/default/transforms.conf) is defined as such:

[syslog-host]
DEST_KEY = MetaData:Host
REGEX = :\d\d\s+(?:\d+\s+|(?:user|daemon|local.?)\.\w+\s+)*\[?(\w[\w\.\-]{2,})\]?\s
FORMAT = host::$1 

(I'll note I pulled these from a 7.3.0 box, but this sourcetype has been in Splunk for quite some time)

After event breaking, this regex is attempted against the event, and if it matches the value of the first capturing group is assigned to the host value. You can probably guess what value is being extracted here: https://regex101.com/r/bdcLOx/1

So options on how to solve it...
1) Have the device actually emit syslog formatted packets... should be something like <PRI>Timestamp Hostname
2) Send the device to another process / rsyslog server which can be configured to reformat the data accordingly and send it on (or write to disk and have a Splunk Forwarder pick it up
3) (my favorite), Change inputs.conf to use a different sourcetype name... ideally your own following the standard vendor:product:technology:format method as described https://docs.splunk.com/Documentation/AddOns/released/Overview/Sourcetypes#Source_type_naming_conven.... Now Splunk's default parsing tends to get things mostly right... but to make Splunk have less guessing on timestamp formats, and event breaking, and make it more more deterministic, You could build your own props/conf for this sourcetype (or look on Splunkbase for a TA for the source system). By avoiding the syslog sourcetype (and thus the syslog-host TRANSFORM) no more rewriting of your host field at index time.

bruceclarke
Contributor

Thanks for the detailed response. I really appreciate it!

This is enough information for me to fix. I also tested and confirmed these changes would work. I agree that option 3 makes the most sense for us.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...