I need to extract "hostname" from the path in data input on directory monitoring.
Path: /export/var/path/host1.log -> Host: host1
Path: /export/var/path/host-02.ac.lp.our.domain.log -> Host: host-02
Path /export/var/path/host3.ac.lp.our.domain.log -> Host: host3
I tried 3 different regexes. They all work on regex101.com, but only extraction for host1.log works when I use regex as host_regex in inputs.conf.
The other 2 "host" get set to "host-02.ac.lp.our.domain" and "host3.ac.lp.our.domain" after the data is ingested instead of being set to
host-2 and host3.
1) \/export\/var\/path\/(.*?[^.]+)
https://regex101.com/r/hu4Wax/1
inputs.conf:
[monitor:///export/var/path/*.log]
disabled = false
host_regex =/export\/var/path/(.*?[^\.]+)
index = default
sourcetype = default
Splunk sets host to host1, host-02.ac.lp.our.domain and host3.ac.lp.our.domain. Objective was host1,host-02,host3.
2) ^\/\w+\/\w+\/\w+\/?([^.]+)
https://regex101.com/r/jTeVML/1
Inputs.conf:
[monitor:///export/var/path/*.log]
disabled = false
host_regex = ^\/\w+\/\w+\/\w+\/?([^\.]+)
index = default
sourcetype = default
Splunk sets host to host1 & host-02.ac.lp.our.domain & host3.ac.lp.our.domain
3) \/export\/var\/path\/(.+?)..*log
https://regex101.com/r/yUJY9j/1/
Inputs.conf:
[monitor:///export/var/path/*.log]
disabled = false
host_regex = \/export\/var\/path\/(.+?)\..*log
index = default
sourcetype = default
Splunk sets host to host1 & host-02.ac.lp.our.domain & host3.ac.lp.our.domain
Will appreciate any advice!
You can check if the regex works with:
| makeresults
| eval Path="/export/var/path/host-04.ac.lp.our.domain.log"
| rex field=Path ".+\/(?<host>\-?[^.]+).*"
| table Path host
Just adjust the host in the "| eval Path=........" to check what is hitting with this regex.
This is your working host_regex: (remove the naming of the capture group "?<host>"
)
host_regex = .+\/(\-?[^.]+).*
The issue was never with the regex. His host field was being overwritten by a transforms from some of the config files in etc/system. See https://answers.splunk.com/comments/710989/view.html
Your #1 answer should be fine but use this:
host_regex = ^(?:\/\w+){3}\/([^\.]+)
The problem that you are having is that you are not evaluating your changes correctly. Are you restarting the splunk forwarder instance after you drop a change? If so, then likely this is because you are not timestamping your events correctly so you are throwing events into the future and so when you think that you are evaluating the effect of your recent change, you are actually looking at events that were processed from a previous change but have just recently tricked from the future into the present. Put in this change and evaluate your search with the All time
timepicker and with these arguments added to the base search, to make sure that you are really seeing events that were indexed recently.
... _index_earliest=-5m _index_latest=@m
@woodcock, host_regex in data input works, but only for logs like host1.log.
When I search for data host is set to "host1" after ingesting host1.log.
But if a file name of a log has a domain name in it , like host-02.ac.lp.our.domain.log,
then host is set to "host-02.ac.lp.our.domain" instead of "host-02".
I don't think this issue is related to time stamping.
Do you really understand what I am saying? The RegEx
is fine. It must be that your evaluation for the efficacy of it is improper. i stand by this statement. Re-read what I said, and use the search parameters that I gave you. The problem is NOT the host_regex
line.
@woodcock, sorry for the delay. Hopefully, you will see my reply.
I understood what you said, I don't have to re-read it.
As I was testing it in our development env first, I made sure I wasn't looking on previously ingested data.
I've deleted index, created a new one , and used web gui -> Add Data -> Index Once -> Used current time as time stamp--> Used Regular Expression on path.
Still host was extracted as expected only for "host.log" format , not for "host.ac.lp.doman.name.log" format.
There is nothing more that I can do. Something is not as it seems. You should open a support case and report back what you eventually find.
@woodcock ,
The sourcetype I was using ( I was selecting already existing sourcetype "syslog", was making some modifications to it and was saving under different name) - had [syslog-host] in transforms.conf , that was overriding my host_regex in data input.
Now I have a challenge - how to extract host by using my host_regex without making any changes to sourcetype ( for number of reasons)
You only get 1 pass.through the parsing queue and if you are using the syslog
sourcetype (which I highly discourage for exactly this reason) then that is the problem. Copy the syslog
stuff that you need into your own sourcetype and work from there.
Did you try the adjusted regex woodcock suggested? That .*?
part in your original regex is not needed and might cause some funky behavior (some regex libraries are more equal than others).
Another option could be is that there is some hostname override happening. Is this syslog-like data, with the hostname also near the start of the log message? By using default sourcetype you may very well get some syslog-host extraction for free defined in system/default/props.conf, which bluntly overwrites whatever you do in inputs.conf.
@FrankVl , hopefully you will see my comment.
You were right in your suggestion!
The sourcetype I was using ( I was selecting already existing sourcetype "syslog", was making some modifications to it and was saving under different name) - had [syslog-host] in transforms.conf , that was overriding my host_regex in data input.
Now I have a challenge - how to extract host by using my host_regex without making any changes to sourcetype.
Try This
| makeresults
| eval Path="/export/var/path/host3.ac.lp.our.domain.log"
| rex field=Path ".+\/(?[^.]+).*"
| table Path host
@saurabhkharkar , I'm trying to extract host from log name into "host" field in data input monitoring via host_regex, not in search