Splunk Search

Index Time field extraction

daveowens
Engager

I have a custom log file with entries like the one below, I want to pull 8 fields out at index time so I can graph and chart of them.

wdSiteData.busy: false wdSiteData.needUpdate: false wdSiteData.requestType: -1 wdSiteData.state: UT wdSiteData.country: USA wdSiteData.district: SOME DISTRICT wdSiteData.availableUpdates: [SP_Update_4_2_1_107_from_96.jar, SP_Update_4_2_1_108.jar, SP_Update_4_2_1_95.jar, SP_Update_4_2_1_96.jar, SP_Update_4_3_0_77_from_4_2_1_108.jar, SP_Update_4_3_0_78.jar, SP_Update_4_3_0_84_from_78.jar, SP_Update_4_4_0_64_from_4_3_0_84.jar] wdSiteData.peerList: null wdSiteData.checksumJar: null wdSiteData.checksumInstall: null wdSiteData.partialDownloadBytes: 0 wdSiteData.filesize: 0 wdSiteData.siteVersion: 7.8.9.10 wdSiteData.versionFrom: null wdSiteData.versionTo: null wdSiteData.timestamp: null wdSiteData.downloadUrl: null wdSiteData.school: -1 wdSiteData.filename: null wdSiteData.updateAvailable: false wdSiteData.clientAddress: 10.10.10.10 wdSiteData.guid: {4445454b1e-805a-11de-8896-fdfdfdfd743c1a} wdSiteData.maximumPeerConnections: 0

I have added in my transforms.conf /opt/splunk/etc/system/default/transforms.conf (regex and format are single lines)
I have tested the regex and it does find the fields I want correctly

[WSM-CONNTECTIONS-SiteData]
REGEX = wdSiteData.(state|country|district|siteVersion|timestamp|school|clientAddress|maximumPeerConnections):
FORMAT = WSM-timestamp::"$5" district::"$3" school::"$6" state::"$1" country::"$2" version::"$4" ipaddress::"$7" peerconnections::"$8"
WRITE_META = [true]

I have added in my props.conf /opt/splunk/etc/system/default/props.conf
[host::$IP_OF_HOST]
TRANSFORMS-WSM = WSM-CONNTECTIONS-SiteData

I have added in my fields.conf /opt/splunk/etc/system/default/fields.conf

[WSM-timestamp]
INDEXED = True

[district]
INDEXED = True

[school]
INDEXED = True

[state]
INDEXED = True

[country]
INDEXED = True

[version]
INDEXED = True

[ipaddress]
INDEXED = True

[peerconnections]
INDEXED = True

Tags (2)
1 Solution

jonuwz
Influencer

As Ayn says, there's no need to make these fields part of your index, using search time extractions is the right way to go 99% of the time.

Also, putting customisations in default/transforms.conf , default/props.conf and default/fields.conf is a bad idea, these files will get overwritten when you patch / upgrade

You should make files in etc/system/local called props.conf and transforms.conf and put any customisations you've made in there.
You should also remove the customisations you made to default/fields.conf - you don't need them for search time extraction.

This is what you need to do search time extractions for all the fields in your Site Data events.

props.conf:

[host::$IP_OF_HOST]
REPORT-WSM = WSM_CONNTECTIONS_SiteData

transforms.conf

[WSM_CONNTECTIONS_SiteData]
REGEX = wdSiteData\.([^:]+):\s+(.*?)(?=(?:\s+wdSiteData|$))
FORMAT = $1::$2

in the search bar run :

| extract reload=t

then

wdSiteData

You should see a bunch of interesting fields in the side bar

View solution in original post

gcusello
SplunkTrust
SplunkTrust

I have a close problem: I have to extract fields at index time to accelerate my searches (I have millions of events with 72 fields in each one) and a people from Splunk suggested to me to extract fields at index time to have a quicker search.
When you say " ...a negative impact on performance..." you are speaking about indexing performance or searching performance?
thank you.
Giuseppe

0 Karma

jonuwz
Influencer

As Ayn says, there's no need to make these fields part of your index, using search time extractions is the right way to go 99% of the time.

Also, putting customisations in default/transforms.conf , default/props.conf and default/fields.conf is a bad idea, these files will get overwritten when you patch / upgrade

You should make files in etc/system/local called props.conf and transforms.conf and put any customisations you've made in there.
You should also remove the customisations you made to default/fields.conf - you don't need them for search time extraction.

This is what you need to do search time extractions for all the fields in your Site Data events.

props.conf:

[host::$IP_OF_HOST]
REPORT-WSM = WSM_CONNTECTIONS_SiteData

transforms.conf

[WSM_CONNTECTIONS_SiteData]
REGEX = wdSiteData\.([^:]+):\s+(.*?)(?=(?:\s+wdSiteData|$))
FORMAT = $1::$2

in the search bar run :

| extract reload=t

then

wdSiteData

You should see a bunch of interesting fields in the side bar

Drainy
Champion

and if you're running 4.3+ you don't even need to do an extract reload=t, search time extractions should be reloaded each time Splunkd forks off a new process for a search.

0 Karma

Ayn
Legend

Generally, always use search-time field extractions. The docs have plenty of information on this that should get you going. Here's a good place to start: http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Addfieldsatsearchtime

0 Karma

daveowens
Engager

The problems is I cannot see the fields in the manager. I am just learning and reading and I just want the fields to always be available for stats and charts. Other than that, please show me a better way!

Dave

0 Karma

Ayn
Legend

By the way you don't actually say what the problem you're having is...?

0 Karma

Ayn
Legend

Why use index-time field extraction? Is there a specific reason for doing so? Index-time field extraction should only be done if there's a really good reason for it, and only if you really know what you're doing. It has a negative impact on performance and often causes increased complexity.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...