Splunk Search

How does one use REGEX, FORMAT, or something else, at index time, to replace "-" with "_" i.e. application_special-source-type => application_special_source_type?

mdwecht
Path Finder

Extracting "_" delimited fields from source file name (regex101.com)

([^\/]+)([^]+)([^]+)([^]+)([^]+)bro([^]+)([^]+)([^_]+).[c][s][v]

/data/input/account_network_system_interface_host_application_special-source-type_timestamp_seqnum.csv

Group 1. 12-19 account
Group 2. 20-26 network
Group 3. 27-33 system
Group 4. 34-43 interface
Group 5. 44-48 host
Group 6. 53-64 special-source-type
Group 7. 65-74 timestamp
Group 8. 75-81 seqnum


Splunk ingest:

/
/ inputs.conf
/
[batch:///data/input/_*_application__*.csv]
sourcetype = application
disabled =0
move_policy = sinkhole
crcSalt =

/ END inputs.conf

/
/ props.conf
/

[application]
..
TRANSFORMS-application-auto-type = application-auto-type
...

/ END props.conf

/
/ transforms.conf
/

...

[application-auto-type]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Sourcetype
REGEX = ([^\/]+)([^]+)([^]+)([^]+)([^]+)application([^]+)([^]+)([^]+).[c][s][v]
FORMAT = sourcetype::application
$6
WRITE_META = true

...

/ END transforms.conf

Result:

sourcetype = application_special-source-type (sourcetype field may have 0, 1, 2, or more "-"s)

Question:

How does one replace the "-" with "_" at index-time so sourcetype = application_special_source_type?

0 Karma
1 Solution

mdwecht
Path Finder

I figured out a way to do what I was trying to do. I was able to use a REGEX to grab the analyst specified sourcetype field from the source file name and since I had to use underscores to separate the fields in the source file name we had to use dashes instead of underscores in the sourcetype field as separators. To replace the dashes with underscores in the sourcetype at index time. I used props and transforms to iterate through the source file name field and replace dashes with underscores. There may be a better way. If anyone has a suggestion please chime in. This method currently supportes sourcetypes specified with up to eight dashes. I would love to see something in transforms like "REPLACE = s/-/_/g".

inputs.conf - Ingest any CSV file generated by an analyst with proper naming convention

[batch:///opt/splunk_input/input/*_*_*_*_*_analyst_*_*_*.csv]
sourcetype = analyst
move_policy = sinkhole
crcSalt = <SOURCE>
disabled = 0

props.conf - Parse the analyst generated file using required time stamp field and extracting the sourcetype from the source file field following "analyst" and change up to eight (8) dashes to underscores in the sourcetype field and add prefix "analyst_". This method always runs eight (8) times. It just works out that when matches are not found the keys I needed were not overwritten.

[analyst]
TRUNCATE = 0
SHOULD_LINEMERGE = false
DATETIME_CONFIG = 
MAX_TIMESTAMP_LOOKAHEAD = 4096
INDEXED_EXTRACTIONS = CSV
TIMESTAMP_FIELDS = ts, _time, time
NO_BINARY_CHECK = false
category = Structured
pulldown_type = 1
TRANSFORMS-auto_analyst_set_fields = set_analyst_fields
TRANSFORMS-auto_analyst_set_host = set_analyst_host_to_sensor
TRANSFORMS-auto_analyst_set_index = set_index_for_analyst_sensor 
TRANSFORMS-auto_analyst_set_sourcetype = set_var01_to_type, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_to_sourcetype

transforms.conf

#
# Analyst 
#
# File Name Fields:      client_collection_system_tag_sensor_analyst_type_timestamp_seqnum.csv
#
# REGEX:                 ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.csv 
#
# Match Groups:          < $1> _<$2 > _< $3 >_< $4  >_< $5 >_analyst_< $6 >_<$7 >_< $8 >.csv
#
#
[accepted_keys]
var01_key = _var01
var02_key = _var02

#
# Referenced in props.conf [analyst]
#
[set_analyst_fields]
SOURCE_KEY = MetaData:Source
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v] 
FORMAT = analyst_client::$1 analyst_collection::$2 analyst_system::$3 analyst_tag::$4
WRITE_META = true

[set_analyst_host_to_sensor]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Host
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v]
FORMAT = host::$5
DEFAULT_VALUE = unknown_analyst_host

[set_index_for_analyst_sensor]
SOURCE_KEY = MetaData:Source
DEST_KEY = _MetaData:Index
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v]
FORMAT = idx_$5
DEFAULT_VALUE = unknown_analyst_index

[set_var01_to_type]
SOURCE_KEY = MetaData:Source
DEST_KEY = _var01
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v] 
FORMAT = _$6

[var01_dash_to_var02_underscore]
SOURCE_KEY = _var01
DEST_KEY = _var02
REGEX = _([^-]+)-([^.]+)
FORMAT = _$1_$2

[var02_to_var01]
SOURCE_KEY = _var02
DEST_KEY = _var01
REGEX = ([^.]+)
FORMAT = $1

[var01_to_sourcetype]
SOURCE_KEY = _var01
DEST_KEY = MetaData:Sourcetype
REGEX = _([^.]+)
FORMAT = sourcetype::analyst_$1
DEFAULT_VALUE = unknown_analyst_sourcetype

fields.conf

[analyst_mission]
INDEXED = false

[analyst_collection]
INDEXED = false

[analyst_system]
INDEXED = false

[analyst_tag]
INDEXED = true

View solution in original post

0 Karma

mdwecht
Path Finder

I figured out a way to do what I was trying to do. I was able to use a REGEX to grab the analyst specified sourcetype field from the source file name and since I had to use underscores to separate the fields in the source file name we had to use dashes instead of underscores in the sourcetype field as separators. To replace the dashes with underscores in the sourcetype at index time. I used props and transforms to iterate through the source file name field and replace dashes with underscores. There may be a better way. If anyone has a suggestion please chime in. This method currently supportes sourcetypes specified with up to eight dashes. I would love to see something in transforms like "REPLACE = s/-/_/g".

inputs.conf - Ingest any CSV file generated by an analyst with proper naming convention

[batch:///opt/splunk_input/input/*_*_*_*_*_analyst_*_*_*.csv]
sourcetype = analyst
move_policy = sinkhole
crcSalt = <SOURCE>
disabled = 0

props.conf - Parse the analyst generated file using required time stamp field and extracting the sourcetype from the source file field following "analyst" and change up to eight (8) dashes to underscores in the sourcetype field and add prefix "analyst_". This method always runs eight (8) times. It just works out that when matches are not found the keys I needed were not overwritten.

[analyst]
TRUNCATE = 0
SHOULD_LINEMERGE = false
DATETIME_CONFIG = 
MAX_TIMESTAMP_LOOKAHEAD = 4096
INDEXED_EXTRACTIONS = CSV
TIMESTAMP_FIELDS = ts, _time, time
NO_BINARY_CHECK = false
category = Structured
pulldown_type = 1
TRANSFORMS-auto_analyst_set_fields = set_analyst_fields
TRANSFORMS-auto_analyst_set_host = set_analyst_host_to_sensor
TRANSFORMS-auto_analyst_set_index = set_index_for_analyst_sensor 
TRANSFORMS-auto_analyst_set_sourcetype = set_var01_to_type, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_dash_to_var02_underscore, \
                                         var02_to_var01, \
                                         var01_to_sourcetype

transforms.conf

#
# Analyst 
#
# File Name Fields:      client_collection_system_tag_sensor_analyst_type_timestamp_seqnum.csv
#
# REGEX:                 ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.csv 
#
# Match Groups:          < $1> _<$2 > _< $3 >_< $4  >_< $5 >_analyst_< $6 >_<$7 >_< $8 >.csv
#
#
[accepted_keys]
var01_key = _var01
var02_key = _var02

#
# Referenced in props.conf [analyst]
#
[set_analyst_fields]
SOURCE_KEY = MetaData:Source
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v] 
FORMAT = analyst_client::$1 analyst_collection::$2 analyst_system::$3 analyst_tag::$4
WRITE_META = true

[set_analyst_host_to_sensor]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Host
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v]
FORMAT = host::$5
DEFAULT_VALUE = unknown_analyst_host

[set_index_for_analyst_sensor]
SOURCE_KEY = MetaData:Source
DEST_KEY = _MetaData:Index
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v]
FORMAT = idx_$5
DEFAULT_VALUE = unknown_analyst_index

[set_var01_to_type]
SOURCE_KEY = MetaData:Source
DEST_KEY = _var01
REGEX = ([^\/]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_analyst_([^_]+)_([^_]+)_([^_]+)\.[c][s][v] 
FORMAT = _$6

[var01_dash_to_var02_underscore]
SOURCE_KEY = _var01
DEST_KEY = _var02
REGEX = _([^-]+)-([^.]+)
FORMAT = _$1_$2

[var02_to_var01]
SOURCE_KEY = _var02
DEST_KEY = _var01
REGEX = ([^.]+)
FORMAT = $1

[var01_to_sourcetype]
SOURCE_KEY = _var01
DEST_KEY = MetaData:Sourcetype
REGEX = _([^.]+)
FORMAT = sourcetype::analyst_$1
DEFAULT_VALUE = unknown_analyst_sourcetype

fields.conf

[analyst_mission]
INDEXED = false

[analyst_collection]
INDEXED = false

[analyst_system]
INDEXED = false

[analyst_tag]
INDEXED = true
0 Karma

mdwecht
Path Finder

I realized after posting this question that the post text formatting removed the underscores "_" in my REGEX examples. Just know they are there and the REGEX works. I just can not figure out how to modify the fields I have captured in REGEX groups to change dashes "-" to underscores "_". I feel that I need a "replace" function in combination with the REGEX that will work at index-time. Or a way to have an additional REGEX capture the variable number of groups within an extracted fields and then allow me to concatenate the dash separated groups with underscores.

0 Karma

mdwecht
Path Finder

Should be:
...
FORMAT = sourcetype::application_$6
...

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...