Getting Data In

Extracting hostname from filename - inputs.conf on UF - host_regex issue

dewald13
Path Finder

Having an issue with bluecoat logs that are dropped on a server with a UF. Attempting to extract the hostname with the following:

host_regex = /logs/rsyslog/bclogs/(.*)-\d{6}[.]log[.]gz

Checked this regex in regexr and it works perfectly.


Sample file names - Host format (ABC-G-PXYW-XXX)

/logs/rsyslog/bclogs/ABC-G-PXYW-002-032016.log.gz
/logs/rsyslog/bclogs/AEC-G-PXYW-001-032016.log.gz
/logs/rsyslog/bclogs/ABC-G-PXYW-002-032014.log.gz
/logs/rsyslog/bclogs/DEF-G-PXYW-003-032016.log.gz

The host is coming in set as the name of the log server, rather than the name.

Thoughts?

1 Solution

bwooden
Splunk Employee
Splunk Employee

If you've restarted your forwarder and don't have any host overrides on your parser/indexer, your regex should work. As should something like this:

host_regex=/logs/rsyslog/bclogs/([\w-]+)(?=-\d{6}\.log\.gz)

View solution in original post

bwooden
Splunk Employee
Splunk Employee

If you've restarted your forwarder and don't have any host overrides on your parser/indexer, your regex should work. As should something like this:

host_regex=/logs/rsyslog/bclogs/([\w-]+)(?=-\d{6}\.log\.gz)

dewald13
Path Finder

That worked with the "/"

Thanks!

0 Karma

dshpritz
SplunkTrust
SplunkTrust

There may also be some metadata rewrites happening, depending on the sourcetype (for example, the syslog sourcetype has built in rewrites).

0 Karma

dshpritz
SplunkTrust
SplunkTrust

Just for a sanity check, has the UF been restarted? The regex looks correct. The other thought is that the system doing the parsing (Heavy Forwarder or Indexer) is overwriting it.

0 Karma

dewald13
Path Finder

Try this one more time.
"^\/logs\/rsyslog\/bclogs\/(.*)-d{6}[.]log[.]gz"

0 Karma

dshpritz
SplunkTrust
SplunkTrust

You need two backlashes for it to display correctly on Splunkbase:
host_regex = ^/logs/rsyslog/bclogs/(.*)-\d{6}[.]log[.]gz

(bitten me tons of times)

dewald13
Path Finder

the site is ripping out the backslashes...

"^\/logs\/rsyslog\/bclogs\/(.*)-\d{6}[.]log[.]gz"

0 Karma

dewald13
Path Finder

This is the current inputs.conf on the Universal Forwarder

index = proxysg
sourcetype = squid
ignoreOlderThan = 60m
disabled = false
host_regex = /logs/rsyslog/bclogs/(.*)-\d{6}[.]log[.]gz

0 Karma

kristian_kolb
Ultra Champion

You're not changing the source are you? See below.

host_regex = <regular expression>
* If specified, <regular expression> extracts host from the path to the file for each input file. 
    * Detail: This feature examines the source key, so if source is set
      explicitly in the stanza, that string will be matched, not the original filename.
* Specifically, the first group of the regex is used as the host. 
* If the regex fails to match, the default "host =" attribute is used.
* If host_regex and host_segment are both set, host_regex will be ignored.

Please post the full inputs.conf stanza for the bc logs.

/k

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...