Getting Data In

Extracting hostname from filename - inputs.conf on UF - host_regex issue

dewald13
Path Finder

Having an issue with bluecoat logs that are dropped on a server with a UF. Attempting to extract the hostname with the following:

host_regex = /logs/rsyslog/bclogs/(.*)-\d{6}[.]log[.]gz

Checked this regex in regexr and it works perfectly.


Sample file names - Host format (ABC-G-PXYW-XXX)

/logs/rsyslog/bclogs/ABC-G-PXYW-002-032016.log.gz
/logs/rsyslog/bclogs/AEC-G-PXYW-001-032016.log.gz
/logs/rsyslog/bclogs/ABC-G-PXYW-002-032014.log.gz
/logs/rsyslog/bclogs/DEF-G-PXYW-003-032016.log.gz

The host is coming in set as the name of the log server, rather than the name.

Thoughts?

1 Solution

bwooden
Splunk Employee
Splunk Employee

If you've restarted your forwarder and don't have any host overrides on your parser/indexer, your regex should work. As should something like this:

host_regex=/logs/rsyslog/bclogs/([\w-]+)(?=-\d{6}\.log\.gz)

View solution in original post

bwooden
Splunk Employee
Splunk Employee

If you've restarted your forwarder and don't have any host overrides on your parser/indexer, your regex should work. As should something like this:

host_regex=/logs/rsyslog/bclogs/([\w-]+)(?=-\d{6}\.log\.gz)

dewald13
Path Finder

That worked with the "/"

Thanks!

0 Karma

dshpritz
SplunkTrust
SplunkTrust

There may also be some metadata rewrites happening, depending on the sourcetype (for example, the syslog sourcetype has built in rewrites).

0 Karma

dshpritz
SplunkTrust
SplunkTrust

Just for a sanity check, has the UF been restarted? The regex looks correct. The other thought is that the system doing the parsing (Heavy Forwarder or Indexer) is overwriting it.

0 Karma

dewald13
Path Finder

Try this one more time.
"^\/logs\/rsyslog\/bclogs\/(.*)-d{6}[.]log[.]gz"

0 Karma

dshpritz
SplunkTrust
SplunkTrust

You need two backlashes for it to display correctly on Splunkbase:
host_regex = ^/logs/rsyslog/bclogs/(.*)-\d{6}[.]log[.]gz

(bitten me tons of times)

dewald13
Path Finder

the site is ripping out the backslashes...

"^\/logs\/rsyslog\/bclogs\/(.*)-\d{6}[.]log[.]gz"

0 Karma

dewald13
Path Finder

This is the current inputs.conf on the Universal Forwarder

index = proxysg
sourcetype = squid
ignoreOlderThan = 60m
disabled = false
host_regex = /logs/rsyslog/bclogs/(.*)-\d{6}[.]log[.]gz

0 Karma

kristian_kolb
Ultra Champion

You're not changing the source are you? See below.

host_regex = <regular expression>
* If specified, <regular expression> extracts host from the path to the file for each input file. 
    * Detail: This feature examines the source key, so if source is set
      explicitly in the stanza, that string will be matched, not the original filename.
* Specifically, the first group of the regex is used as the host. 
* If the regex fails to match, the default "host =" attribute is used.
* If host_regex and host_segment are both set, host_regex will be ignored.

Please post the full inputs.conf stanza for the bc logs.

/k

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...