Splunk Search

Remove non alphanumeric at the very start in props.conf

robertlynch2020
Motivator

Is it possible to remove all non alpha-numeric when taking in data in the props.conf?

I have tried wiht regex but i cant seem to get it.

This is the data

20151029|12:31:00|MUREXFO   |     1 |SessionCreate                 |MXDIS..&PATCHER                  |   0.21s|   0.22s|100%|  -0.01s|   0% |                                      |1065.44Mb
20151029|12:31:00|MUREXFO   |     2 |RequestDocument3              |MXD~'##ISPATCHER                  |   0.01s|   0.03s|100%|  -0.02s|   0% |                                      |1065.65Mb
20151029|12:31:00|MUREXFO   |     3 |RequestDocument3              |MXDISP..??ATCHER                  |   0.01s|   0.01s|100%|   0.00s|   0% |         

Regex i have - specifically the Command field

^(?:[^\|\n]*\|){5}(?P<Command>\w+)| *-*(?P<Elapsed2>\d+\.\d+)\w+\|

This is what i have initally

MXDIS..&PATCHER
MXD~'##ISPATCHER
MXDISP..??ATCHER

This is what i get

MXDIS
MXD
MXDISP

This is what i want

MXDISPATCHER
MXDSPATCHER
MXDISPATCHER

Cheers for any help on this 🙂

0 Karma

woodcock
Esteemed Legend

Try this, assuming that your sourcetype is MX_TIMING:

In props.conf:

[MX_TIMING]
SEDCMD-removejunk = s/(?:^|[\r\n])(([^\|]+\|){5})([A-z]+)[^A-Z]*([A-z]+\s+\|)/\1\3\4/g
LINE_BREAKER = ([\r\n]+\s*)(?=\d+\|\d+\:\d+)
SHOULD_LINEMERGE = false
TIME_PREFIX = ^
TIME_FORMAT = %Y%m%d|%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 17
REPORT-MX_TIMING = MX_TIMING_SearchTimeFieldExtractions

In transforms.conf:

[MX_TIMING_SearchTimeFieldExtractions]
DELIMS = "|"
FIELDS = "Date","Time","UserName","ID","Context","Command"

This will need to be deployed to your Heavy Forwarders, Indexers, and Search Heads. Then all Splunk instances must be restarted on those servers. These changes will only effect events that get indexed after the restarts; older events will stay broken.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Try this for props.conf on your indexer/heavy forwarder.
20151029|12:31:00|MUREXFO | 1 |SessionCreate |MXDIS..&PATCHER | 0.21s| 0.22s|100%| -0.01s| 0% |

[MX_TIMING]
SHOULD_LINEMERGE =false
LINE_BREAKER = ([\r\n]+)(?=\d+\|\d+\:\d+)
TIME_PREFIX = ^
TIME_FORMAT = %Y%m%d|%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 17
SEDCMD-removejunk = s/^(([^\|]+\|){5})([A-z]+)[^A-Z]*([A-z]+\s+\|)/\1\3\4/

robertlynch2020
Motivator

Hi

Thanks for the answer.
Sorry to say, i am still seeing these characters non non alpha-numeric

I also had to add in some lines as well as i need to grab out Fields

1st try

[MX_TIMING]
SHOULD_LINEMERGE =false
LINE_BREAKER = ([\r\n]+)(?=\d+\|\d+\:\d+)
TIME_PREFIX = ^
TIME_FORMAT = %Y%m%d|%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 17
REPORT-MX-TIMING = REPORT-MX-TIMING2
EXTRACT-MX-TIMING = ^(?:[^\|\n]*\|){6} *-*(?P\d+\.\d+)\w+\| *-*(?P\d+\.\d+)s\| *-*(?P\d+)%\| *-*(?P\d+\.\d+)s\| *-*(?P\d+)%\s+\|
EXTRACT-MX-TIMING-Memory = \| *(?P\d+\.\d+)Mb*$
SEDCMD-removejunk = s/^(([^\|]+\|){5})([A-z]+)[^A-Z]*([A-z]+\s+\|)/\1\3\4/

2nd try

[MX_TIMING]
SHOULD_LINEMERGE =false
LINE_BREAKER = ([\r\n]+)(?=\d+\|\d+\:\d+)
TIME_PREFIX = ^
TIME_FORMAT = %Y%m%d|%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 17
SEDCMD-removejunk = s/^(([^\|]+\|){5})([A-z]+)[^A-Z]*([A-z]+\s+\|)/\1\3\4/    
REPORT-MX-TIMING = REPORT-MX-TIMING2
EXTRACT-MX-TIMING = ^(?:[^\|\n]*\|){6} *-*(?P\d+\.\d+)\w+\| *-*(?P\d+\.\d+)s\| *-*(?P\d+)%\| *-*(?P\d+\.\d+)s\| *-*(?P\d+)%\s+\|
EXTRACT-MX-TIMING-Memory = \| *(?P\d+\.\d+)Mb*$

This is the transform

[REPORT-MX-TIMING2]
DELIMS = "|"
FIELDS = "Date","Time","UserName","ID","Context","Command"
0 Karma

woodcock
Esteemed Legend

Do you need to strip these characters out of the input BEFORE the data is indexed OR do you need to strip these out of each event AS each event is indexed OR do you need to strip these out of the fields at search time (after the event is indexed)?

0 Karma

robertlynch2020
Motivator

Hi

Ideally i want to do it AS it is been indexed. This was i can also view _raw if i need to..

Cheers

0 Karma

robertlynch2020
Motivator

However if AS was not possible i would BEFORE would work as well.

Cheers again

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi robertlynch2020,
if you want to extract fields from a file like a csv see (http://docs.splunk.com/Documentation/Splunk/6.5.3/Data/Extractfieldsfromfileswithstructureddata)
in other words, something like this:
in props.conf (both on forwarder and indexer)

[your_sourcetype]
INDEXED_EXTRACTIONS = CSV
FIELD_DELIMITER = |
FIELD_NAMES = field1,field2,fieldn

the easiest way to find the correct sourcetype definition is to download an example of your file and load it into a test index by web interface creating the correct sourcetype from an existing one.

Beware that this props.conf must be both on Indexers and forwarders!

Bye.
Giuseppe

0 Karma

robertlynch2020
Motivator

Hi

I have the following, what i am looking a way is to imporve my regex, if that is possible to take out the non alpha-numeric up till a pipe

[MX_TIMING]
DATETIME_CONFIG = 
NO_BINARY_CHECK = true
category = Custom
description = MX_TIMING
disabled = false
pulldown_type = true
REPORT-MX-TIMING = REPORT-MX-TIMING2
EXTRACT-MX-TIMING = ^(?:[^\|\n]*\|){6} *-*(?P<Elapsed>\d+\.\d+)\w+\| *-*(?P<CPU>\d+\.\d+)s\| *-*(?P<CPU_PER>\d+)%\| *-*(?P<RDB_COM>\d+\.\d+)s\| *-*(?P<RDB_COM_PER>\d+)%\s+\|
EXTRACT-MX-TIMING-Memory = \| *(?P<Memory>\d+\.\d+)Mb*$
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi robertlynch2020,
Sorry, I'm not sure to had understood your need:
do you want to remove only pipelines (|) and take the fields delimited by pipeline or do you want to delete all the non alpha-numeric chars like | or & or #?
do you want to remove them or substitute them with another char?

every way, in both the cases I suggest to use a delimited extraction so you can have your fields without using regex.
after you can delete non alpha-numeric chars.

to delete a char you can use SEDCMD command in props.conf

SEDCMD-remove_not_alpha = s/&|#\\|//g

Bye.
Giuseppe

0 Karma

robertlynch2020
Motivator

Hi

Thanks for you help on this.

Below is a sample set of the data i have

 20151029|12:31:00|MUREXFO   |     1 |**SessionCreate**                 |**MXDIS..&PATCHER**                  |   0.21s|   0.22s|100%|  -0.01s|   0% |                                      |1065.44Mb
 20151029|12:31:00|MUREXFO   |     2 |**RequestDocument3**              |**MXD~'##ISPATCHER**                  |   0.01s|   0.03s|100%|  -0.02s|   0% |                                      |1065.65Mb
 20151029|12:31:00|MUREXFO   |     3 |**RequestDocument3**              |**MXDISP..??ATCHER**                  |   0.01s|   0.01s|100%|   0.00s|   0% |  

I am able to pull out the data between the pipes, this is fine 🙂 . I have created a transform.

The issues in the case of pipe 5(SessionCreate) and 6(MXDIS..&PATCHER) there can be non- A-to-Z or non 0-9.

So i have MXDIS..&PATCHER i want MXDISPATCHER

I am looking to remove these

This is what i have initially
MXDIS..&PATCHER
MXD~'##ISPATCHER
MXDISP..??ATCHER

This is what i want
MXDISPATCHER
MXDSPATCHER
MXDISPATCHER

I have tried your suggestion, but i am still getting non- A-Z and non 0-9

[MX_TIMING]
DATETIME_CONFIG = 
NO_BINARY_CHECK = true
category = Custom
description = MX_TIMING
disabled = false
pulldown_type = true
REPORT-MX-TIMING = REPORT-MX-TIMING2
SEDCMD-remove_not_alpha = s/&|#\\|//g

The transform i have is
[REPORT-MX-TIMING2]
DELIMS = "|"
FIELDS = "Date","Time","UserName","ID","Context","Command"

Cheers

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi robertlynch2020,
Ok try this SEDCMD:

SEDCMD-remove_not_alpha = s/MXDIS\.\.\&PATCHER|MXD\~\'\#\#ISPATCHER|MXDISP\.\.\?\?ATCHER/MXDISPATCHER/g

regex should run, eventually create three SEDCMD, one for each string

SEDCMD-remove_not_alpha1 = s/MXDIS\.\.\&PATCHER/MXDISPATCHER/g
SEDCMD-remove_not_alpha2 = s/MXD\~\'\#\#ISPATCHER/MXDISPATCHER/g
SEDCMD-remove_not_alpha3 = s/MXDISP\.\.\?\?ATCHER/MXDISPATCHER/g

Bye.
Giuseppe

0 Karma

robertlynch2020
Motivator

Hi

Thanks again 🙂

The issues with above is only a sample set, i will have millions of lines of data. All whit different lines and different patterns.

I need a generic way to strip non-alphanumeric

for example
Imput
ABC.$%$123....///...///ABC
Output
ABC123ABC

0 Karma

gcusello
SplunkTrust
SplunkTrust

if you can define some rules, you can execute all the transformations, otherwise you can only remove non alphabetical chars.
if the transformations to perform are many but always the same, you could create a lookup and then transform them at search time.

Bye.
Giuseppe

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...