Is it possible to remove all non alpha-numeric when taking in data in the props.conf?
I have tried wiht regex but i cant seem to get it.
This is the data
20151029|12:31:00|MUREXFO | 1 |SessionCreate |MXDIS..&PATCHER | 0.21s| 0.22s|100%| -0.01s| 0% | |1065.44Mb
20151029|12:31:00|MUREXFO | 2 |RequestDocument3 |MXD~'##ISPATCHER | 0.01s| 0.03s|100%| -0.02s| 0% | |1065.65Mb
20151029|12:31:00|MUREXFO | 3 |RequestDocument3 |MXDISP..??ATCHER | 0.01s| 0.01s|100%| 0.00s| 0% |
Regex i have - specifically the Command field
^(?:[^\|\n]*\|){5}(?P<Command>\w+)| *-*(?P<Elapsed2>\d+\.\d+)\w+\|
This is what i have initally
MXDIS..&PATCHER
MXD~'##ISPATCHER
MXDISP..??ATCHER
This is what i get
MXDIS
MXD
MXDISP
This is what i want
MXDISPATCHER
MXDSPATCHER
MXDISPATCHER
Cheers for any help on this 🙂
Try this, assuming that your sourcetype
is MX_TIMING
:
In props.conf:
[MX_TIMING]
SEDCMD-removejunk = s/(?:^|[\r\n])(([^\|]+\|){5})([A-z]+)[^A-Z]*([A-z]+\s+\|)/\1\3\4/g
LINE_BREAKER = ([\r\n]+\s*)(?=\d+\|\d+\:\d+)
SHOULD_LINEMERGE = false
TIME_PREFIX = ^
TIME_FORMAT = %Y%m%d|%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 17
REPORT-MX_TIMING = MX_TIMING_SearchTimeFieldExtractions
In transforms.conf:
[MX_TIMING_SearchTimeFieldExtractions]
DELIMS = "|"
FIELDS = "Date","Time","UserName","ID","Context","Command"
This will need to be deployed to your Heavy Forwarders, Indexers, and Search Heads. Then all Splunk instances must be restarted on those servers. These changes will only effect events that get indexed after the restarts; older events will stay broken.
Try this for props.conf on your indexer/heavy forwarder.
20151029|12:31:00|MUREXFO | 1 |SessionCreate |MXDIS..&PATCHER | 0.21s| 0.22s|100%| -0.01s| 0% |
[MX_TIMING]
SHOULD_LINEMERGE =false
LINE_BREAKER = ([\r\n]+)(?=\d+\|\d+\:\d+)
TIME_PREFIX = ^
TIME_FORMAT = %Y%m%d|%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 17
SEDCMD-removejunk = s/^(([^\|]+\|){5})([A-z]+)[^A-Z]*([A-z]+\s+\|)/\1\3\4/
Hi
Thanks for the answer.
Sorry to say, i am still seeing these characters non non alpha-numeric
I also had to add in some lines as well as i need to grab out Fields
1st try
[MX_TIMING]
SHOULD_LINEMERGE =false
LINE_BREAKER = ([\r\n]+)(?=\d+\|\d+\:\d+)
TIME_PREFIX = ^
TIME_FORMAT = %Y%m%d|%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 17
REPORT-MX-TIMING = REPORT-MX-TIMING2
EXTRACT-MX-TIMING = ^(?:[^\|\n]*\|){6} *-*(?P\d+\.\d+)\w+\| *-*(?P\d+\.\d+)s\| *-*(?P\d+)%\| *-*(?P\d+\.\d+)s\| *-*(?P\d+)%\s+\|
EXTRACT-MX-TIMING-Memory = \| *(?P\d+\.\d+)Mb*$
SEDCMD-removejunk = s/^(([^\|]+\|){5})([A-z]+)[^A-Z]*([A-z]+\s+\|)/\1\3\4/
2nd try
[MX_TIMING]
SHOULD_LINEMERGE =false
LINE_BREAKER = ([\r\n]+)(?=\d+\|\d+\:\d+)
TIME_PREFIX = ^
TIME_FORMAT = %Y%m%d|%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 17
SEDCMD-removejunk = s/^(([^\|]+\|){5})([A-z]+)[^A-Z]*([A-z]+\s+\|)/\1\3\4/
REPORT-MX-TIMING = REPORT-MX-TIMING2
EXTRACT-MX-TIMING = ^(?:[^\|\n]*\|){6} *-*(?P\d+\.\d+)\w+\| *-*(?P\d+\.\d+)s\| *-*(?P\d+)%\| *-*(?P\d+\.\d+)s\| *-*(?P\d+)%\s+\|
EXTRACT-MX-TIMING-Memory = \| *(?P\d+\.\d+)Mb*$
This is the transform
[REPORT-MX-TIMING2]
DELIMS = "|"
FIELDS = "Date","Time","UserName","ID","Context","Command"
Do you need to strip these characters out of the input BEFORE the data is indexed OR do you need to strip these out of each event AS each event is indexed OR do you need to strip these out of the fields at search time (after the event is indexed)?
Hi
Ideally i want to do it AS it is been indexed. This was i can also view _raw if i need to..
Cheers
However if AS was not possible i would BEFORE would work as well.
Cheers again
Hi robertlynch2020,
if you want to extract fields from a file like a csv see (http://docs.splunk.com/Documentation/Splunk/6.5.3/Data/Extractfieldsfromfileswithstructureddata)
in other words, something like this:
in props.conf (both on forwarder and indexer)
[your_sourcetype]
INDEXED_EXTRACTIONS = CSV
FIELD_DELIMITER = |
FIELD_NAMES = field1,field2,fieldn
the easiest way to find the correct sourcetype definition is to download an example of your file and load it into a test index by web interface creating the correct sourcetype from an existing one.
Beware that this props.conf must be both on Indexers and forwarders!
Bye.
Giuseppe
Hi
I have the following, what i am looking a way is to imporve my regex, if that is possible to take out the non alpha-numeric up till a pipe
[MX_TIMING]
DATETIME_CONFIG =
NO_BINARY_CHECK = true
category = Custom
description = MX_TIMING
disabled = false
pulldown_type = true
REPORT-MX-TIMING = REPORT-MX-TIMING2
EXTRACT-MX-TIMING = ^(?:[^\|\n]*\|){6} *-*(?P<Elapsed>\d+\.\d+)\w+\| *-*(?P<CPU>\d+\.\d+)s\| *-*(?P<CPU_PER>\d+)%\| *-*(?P<RDB_COM>\d+\.\d+)s\| *-*(?P<RDB_COM_PER>\d+)%\s+\|
EXTRACT-MX-TIMING-Memory = \| *(?P<Memory>\d+\.\d+)Mb*$
Hi robertlynch2020,
Sorry, I'm not sure to had understood your need:
do you want to remove only pipelines (|) and take the fields delimited by pipeline or do you want to delete all the non alpha-numeric chars like | or & or #?
do you want to remove them or substitute them with another char?
every way, in both the cases I suggest to use a delimited extraction so you can have your fields without using regex.
after you can delete non alpha-numeric chars.
to delete a char you can use SEDCMD command in props.conf
SEDCMD-remove_not_alpha = s/&|#\\|//g
Bye.
Giuseppe
Hi
Thanks for you help on this.
Below is a sample set of the data i have
20151029|12:31:00|MUREXFO | 1 |**SessionCreate** |**MXDIS..&PATCHER** | 0.21s| 0.22s|100%| -0.01s| 0% | |1065.44Mb
20151029|12:31:00|MUREXFO | 2 |**RequestDocument3** |**MXD~'##ISPATCHER** | 0.01s| 0.03s|100%| -0.02s| 0% | |1065.65Mb
20151029|12:31:00|MUREXFO | 3 |**RequestDocument3** |**MXDISP..??ATCHER** | 0.01s| 0.01s|100%| 0.00s| 0% |
I am able to pull out the data between the pipes, this is fine 🙂 . I have created a transform.
The issues in the case of pipe 5(SessionCreate) and 6(MXDIS..&PATCHER) there can be non- A-to-Z or non 0-9.
So i have MXDIS..&PATCHER i want MXDISPATCHER
I am looking to remove these
This is what i have initially
MXDIS..&PATCHER
MXD~'##ISPATCHER
MXDISP..??ATCHER
This is what i want
MXDISPATCHER
MXDSPATCHER
MXDISPATCHER
I have tried your suggestion, but i am still getting non- A-Z and non 0-9
[MX_TIMING]
DATETIME_CONFIG =
NO_BINARY_CHECK = true
category = Custom
description = MX_TIMING
disabled = false
pulldown_type = true
REPORT-MX-TIMING = REPORT-MX-TIMING2
SEDCMD-remove_not_alpha = s/&|#\\|//g
The transform i have is
[REPORT-MX-TIMING2]
DELIMS = "|"
FIELDS = "Date","Time","UserName","ID","Context","Command"
Cheers
Hi robertlynch2020,
Ok try this SEDCMD:
SEDCMD-remove_not_alpha = s/MXDIS\.\.\&PATCHER|MXD\~\'\#\#ISPATCHER|MXDISP\.\.\?\?ATCHER/MXDISPATCHER/g
regex should run, eventually create three SEDCMD, one for each string
SEDCMD-remove_not_alpha1 = s/MXDIS\.\.\&PATCHER/MXDISPATCHER/g
SEDCMD-remove_not_alpha2 = s/MXD\~\'\#\#ISPATCHER/MXDISPATCHER/g
SEDCMD-remove_not_alpha3 = s/MXDISP\.\.\?\?ATCHER/MXDISPATCHER/g
Bye.
Giuseppe
Hi
Thanks again 🙂
The issues with above is only a sample set, i will have millions of lines of data. All whit different lines and different patterns.
I need a generic way to strip non-alphanumeric
for example
Imput
ABC.$%$123....///...///ABC
Output
ABC123ABC
if you can define some rules, you can execute all the transformations, otherwise you can only remove non alphabetical chars.
if the transformations to perform are many but always the same, you could create a lookup and then transform them at search time.
Bye.
Giuseppe