Getting Data In

Escape JSON data at index time

darinmoon
Explorer

I'm trying to escape JSON data at index time because I can't do it from within the application that is generating the log. Although the log syntax is JSON, some of the data comes with unescaped backslashes. I'm very new to Splunk so I'm kind of floundering here. Is there a way to transform the data at index time to escape the backslashes? Here is a sample (notice the 4th field - UserName):

{
    "Timestamp": "2014-05-10 10:11:38.768",
    "TimeZone": "EDT",
    "Machine": "xyz",
    "UserName": "someDomain\someUser",
    "CorrelatorID": "DistributedWorkController",
    "TimerDepth": "0",
    "Message": "Executing the GetNextAvailableWorkItem method.",
    "ApplicationName": "DistributedWorkController",
    "Context": "GetNextAvailableWorkItem",
    "TimerMilliseconds": "650",
    "TimerType": "Method",
}

Thanks!

Tags (3)
1 Solution

darinmoon
Explorer

I got it to work with the following props.conf:

[odrtest2]
KV_MODE = json
pulldown_type = 1
SEDCMD-backslash=s/\\/\\\\/g

No changes to the inputs.conf were necessary. For some reason the above didn't work originally but when I created a new index and source from scratch, it worked. Perhaps there was a lingering setting that I wasn't aware of. In addition, I also needed to be able to remove carriage returns and line feeds contained within events. The following sedcmd lines added to the props.conf above took care of that:

SEDCMD-cr=s/\x0D//g
SEDCMD-lf=s/\x0A//g

Combining these into one sedcmd (e.g. \r\n) didn't work but removing them separately did.

Thanks to @dmaislin_splunk for the guidance and perseverance!

View solution in original post

darinmoon
Explorer

I got it to work with the following props.conf:

[odrtest2]
KV_MODE = json
pulldown_type = 1
SEDCMD-backslash=s/\\/\\\\/g

No changes to the inputs.conf were necessary. For some reason the above didn't work originally but when I created a new index and source from scratch, it worked. Perhaps there was a lingering setting that I wasn't aware of. In addition, I also needed to be able to remove carriage returns and line feeds contained within events. The following sedcmd lines added to the props.conf above took care of that:

SEDCMD-cr=s/\x0D//g
SEDCMD-lf=s/\x0A//g

Combining these into one sedcmd (e.g. \r\n) didn't work but removing them separately did.

Thanks to @dmaislin_splunk for the guidance and perseverance!

dmaislin_splunk
Splunk Employee
Splunk Employee

I just tried this with the following log example and it works perfectly with no SEDCMD.

{"Timestamp":"2014-05-10 10:11:38.768","TimeZone":"EDT","Machine":"xyz","UserName":"Splunk\\asmith","CorrelatorID":"DistributedWorkController","TimerDepth":"0","Message":"AExecuting the GetNextAvailableWorkItem method.","ApplicationName":"DistributedWorkController","Context":"GetNextAvailableWorkItem","TimerMilliseconds":"650","TimerType":"Method"} 
{"Timestamp":"2014-05-10 10:10:38.768","TimeZone":"EDT","Machine":"xyz","UserName":"Splunk\\bsmith","CorrelatorID":"DistributedWorkController","TimerDepth":"0","Message":"AExecuting the GetNextAvailableWorkItem method.","ApplicationName":"DistributedWorkController","Context":"GetNextAvailableWorkItem","TimerMilliseconds":"650","TimerType":"Method"} 
{"Timestamp":"2014-05-10 10:09:38.768","TimeZone":"EDT","Machine":"xyz","UserName":"Splunk\\csmith","CorrelatorID":"DistributedWorkController","TimerDepth":"0","Message":"AExecuting the GetNextAvailableWorkItem method.","ApplicationName":"DistributedWorkController","Context":"GetNextAvailableWorkItem","TimerMilliseconds":"650","TimerType":"Method"}

props.conf

[ODR]
INDEXED_EXTRACTIONS = json
KV_MODE = none
NO_BINARY_CHECK = 1
pulldown_type = 1

inputs.conf

[monitor:///Users/dmaislin/Desktop/jsons.txt]
disabled = false
followTail = 0
sourcetype = ODR
0 Karma

dmaislin_splunk
Splunk Employee
Splunk Employee

Please share your final props.conf as we have certainly talked about many options here.

0 Karma

darinmoon
Explorer

I got it to work with what I think was your second suggestion plus the sedcmd that I was originally using. I've added a new answer post below with the details. I don't know why it didn't work the first time I tried it but I created a new index and source from scratch and it worked this time.

0 Karma

jrodman
Splunk Employee
Splunk Employee

sedcmd should be effective, but modifies _raw. I can't really tell if the problem is the value of _raw or the value of other fields. sedcmd happens in the regexreplacement phase of data handling, and indexed extractions happen before this.

darinmoon
Explorer

Right. Did you try it with my original events that contain single backslashes instead of double backslashes? The props.conf here is the same that I originally got from the Splunk data import wizard. It works great if the backslashes are already escaped like your example. The problem is that my data is coming with unescaped (single) backslashes, which is why I need to transform the data and was trying (unsuccessfully) to use SEDCMD to accomplish that.

0 Karma

leonjxtan
Path Finder

your samples all only have double backslash "\", and it won't have any problem by themselves. I think the main post is having problem when single backslash"\" in the JSON msg.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...