Hello,
Since I often search a specific expression in a large set of events, I would like to index it.
Every single instance that I am running has the following format:
instance-name.generic-name.subdomaine.domain.com
In this expression, only domain.com is static and will never change.
I would like to extract generic-name for all of my events.
props.conf
[generic-name]
TRANSFORMS-generic-name = generic-name
transforms.conf
[generic-name]
REGEX = (?<instancename>[^\.]+)\.(?<gname>[^\.]+)\.(?<subdomain>[^\.]+)\.(?<domain>[^\.]+)\.
fields.conf
[gname]
INDEXED = True
I am wondering if the fact that I am not receiving anything in the Splunk dashboard is coming from my configuration file or my regular expression ?
Thank you in advance for your help
Update: I have tried all the following regexp and there is still no result. I don't receive any data in my sourcetype.
I've decided to add a totally separate answer here, since if I'm right... your regex is fine (it was just the markup bug we're dealing with now that confused everyone) but your transforms syntax is off.:
Create an indexed field:
[extracted-gname]
REGEX = whatevercomesbeforeit [^\.]+\.(?<gname>[^\.]+)\.[^\.]+\..+
FORMAT = gname::$1
[extracting-from-host]
SOURCE_KEY = MetaData:Host
REGEX = [^\.]+\.(?<gname>[^\.]+)\.[^\.]+\..+
FORMAT = gname::$1
[indexed-gname]
REGEX = whatevercomesbeforeit [^\.]+\.(?<gname>[^\.]+)\.[^\.]+\..+
FORMAT = gname::$1
WRITE_META = true
Thank you for your answer.
Your regexp looks good and easy to understand but maybe slower due to multiple extraction.
Anyway, I still receive no data when I am trying to use yours. Am I missing something else somewhere ?
You need to swap the frontslashes for backslashes (stinking broken markdown). It does work; I tested it. It is important to include the other portions (but you don't necessarily have to capture them into fields) because otherwise your single capture will be capturing things you do not intend.
Okay thank you, both of your regexp woocock and rsenett_splunk are matching what I want, which is perfect.
However, I still don't receive anything in the dashboard. The sourcetype is fine in the license. I have updated my first post with your regex: it is all up to date.
Post your dashboard xml.
I am just using the search: "sourcetype=generic-name gname=foo", in my Splunk App.
This is probably the problem:
http://blogs.splunk.com/2011/10/07/cannot-search-based-on-an-extracted-field/
Try this search instead:
sourcetype=generic-name gname=* | search gname="foo"
My problem is different: in the link you gave me, sourcetype=generic-name gname=* should give results which is not my case. I litteraly get nothing.
Your statement is incorrect. Did you run the exact search that I gave you (even though you think it is silly)? Did it give results? The problem in the link causes searches to give 0 results. It is a very nuanced thing, trust me. Just run this and tell me what you get:
sourcetype=generic-name gname=* | search gname="foo"
I'm pretty sure that's only a problem if he's running Splunk 4.2 and earlier...
This "problem" (it is not actually a problem, it is a deliberate design compromise) exists for all versions of Splunk.
I think there is some confusion about exactly what your problem is.
Your question says... you often search for an expression like:
instance-name.generic-name.subdomain.domain.com
I think some folks here have assumed that this is found the host
. I didn't get that from what you've said.
Also, you're giving us the text and we're giving you legitimate working regexes and still you're getting nothing. So it would be a good idea if you posted a couple of events that contain the values you're looking for so we can see what might be going wrong.
Also... see my edited answer that addresses your transforms.conf syntax
OP's first comment under the question suggests the value is in the host
field, in which case all the transforms.conf REGEX on _raw is pointless.
...I'd still first go with search-time extractions and see if there's any performance hit left over to be addressed with indexed stuff...
Ah. I missed. that. And I totally agree. Amended my 2nd answer in case it's just a matter of missing the SOURCE_KEY... which would only bring back the value if that full structure was available and without an anchor would be horribly non-performant...
See my answer. You were missing an actual extraction. Your capturing group surrounded only the field name... so nothing was being captured. You're also representing only one iteration of "anything that is not a dot" because you were missing the + which says "Everything that is not a dot, until you hit the dot". Whether you grab all the fields, or put literals in the domain and sub domain it doesn't matter as long as you are actually capturing something. As for "Slower" as long as you are moving forward (and not doing lookbacks) speed isn't an issue.
Are you sure you need indexed extractions here?
What happens when you run this search:
index=foo sourcetype=generic-name gname=some-gname
Is the scanCount in the job inspector higher than the resultCount?
Thank you for your answer. Yes I am pretty sure that I need indexed extractions here since I am running the equivalence of gname=foo on every single search I do. Anyway, I will compare the performance before and after my change.
When I run this:
index=foo sourcetype=generic-name gname=some-gname
I got: No Results Found. Even with sourcetype=generic-name only and gname=some-gname only.
scanCount=0 resultCount=0.
I am wondering if the host is part of the data. Is the host part of the data that I can extract ? Or maybe it is just my regexp.