Getting Data In

Extracting hostname from filename - inputs.conf on UF - host_regex issue

dewald13
Path Finder

Having an issue with bluecoat logs that are dropped on a server with a UF. Attempting to extract the hostname with the following:

host_regex = /logs/rsyslog/bclogs/(.*)-\d{6}[.]log[.]gz

Checked this regex in regexr and it works perfectly.


Sample file names - Host format (ABC-G-PXYW-XXX)

/logs/rsyslog/bclogs/ABC-G-PXYW-002-032016.log.gz
/logs/rsyslog/bclogs/AEC-G-PXYW-001-032016.log.gz
/logs/rsyslog/bclogs/ABC-G-PXYW-002-032014.log.gz
/logs/rsyslog/bclogs/DEF-G-PXYW-003-032016.log.gz

The host is coming in set as the name of the log server, rather than the name.

Thoughts?

1 Solution

bwooden
Splunk Employee
Splunk Employee

If you've restarted your forwarder and don't have any host overrides on your parser/indexer, your regex should work. As should something like this:

host_regex=/logs/rsyslog/bclogs/([\w-]+)(?=-\d{6}\.log\.gz)

View solution in original post

bwooden
Splunk Employee
Splunk Employee

If you've restarted your forwarder and don't have any host overrides on your parser/indexer, your regex should work. As should something like this:

host_regex=/logs/rsyslog/bclogs/([\w-]+)(?=-\d{6}\.log\.gz)

dewald13
Path Finder

That worked with the "/"

Thanks!

0 Karma

dshpritz
SplunkTrust
SplunkTrust

There may also be some metadata rewrites happening, depending on the sourcetype (for example, the syslog sourcetype has built in rewrites).

0 Karma

dshpritz
SplunkTrust
SplunkTrust

Just for a sanity check, has the UF been restarted? The regex looks correct. The other thought is that the system doing the parsing (Heavy Forwarder or Indexer) is overwriting it.

0 Karma

dewald13
Path Finder

Try this one more time.
"^\/logs\/rsyslog\/bclogs\/(.*)-d{6}[.]log[.]gz"

0 Karma

dshpritz
SplunkTrust
SplunkTrust

You need two backlashes for it to display correctly on Splunkbase:
host_regex = ^/logs/rsyslog/bclogs/(.*)-\d{6}[.]log[.]gz

(bitten me tons of times)

dewald13
Path Finder

the site is ripping out the backslashes...

"^\/logs\/rsyslog\/bclogs\/(.*)-\d{6}[.]log[.]gz"

0 Karma

dewald13
Path Finder

This is the current inputs.conf on the Universal Forwarder

index = proxysg
sourcetype = squid
ignoreOlderThan = 60m
disabled = false
host_regex = /logs/rsyslog/bclogs/(.*)-\d{6}[.]log[.]gz

0 Karma

kristian_kolb
Ultra Champion

You're not changing the source are you? See below.

host_regex = <regular expression>
* If specified, <regular expression> extracts host from the path to the file for each input file. 
    * Detail: This feature examines the source key, so if source is set
      explicitly in the stanza, that string will be matched, not the original filename.
* Specifically, the first group of the regex is used as the host. 
* If the regex fails to match, the default "host =" attribute is used.
* If host_regex and host_segment are both set, host_regex will be ignored.

Please post the full inputs.conf stanza for the bc logs.

/k

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...