Getting Data In

SEDCMD to strip HTTP headers from raw TCP input (JSON submitted from HTTP POST)

beaunewcomb
Communicator

Trying to strip the header info out of the event below, leaving only the JSON. I've tried "|extract reload=true" but neither that nor restarting Splunk seems to be working. Must be something with my syntax. This example is trying to remove the first 2 lines (for sake of simplicity in getting it to work)

props.conf:

[akamai_post_json]
SEDCMD-httpheader = s/(?mg)^POST.*$\n|^User-Agent.*$\n|//g

The event:

POST / HTTP/1.1
User-Agent: curl/7.26.0
Host: localhost
Accept: */*
Content-Length: 2552
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------0b1c32056fc5
------------------------------0b1c32056fc5
Content-Disposition: form-data; name="fileupload"; filename="default_connector_schema_1.0.json"
Content-Type: application/octet-stream
{
  "apiType" : "String",
  "apiFormat" : "String",
  "apiVersion" : 0,
  "ID" : "String",
  "startTime" : "String",
  "eventType" : "String",
  "cpCode" : 0,
  "message" : {
    "protocol" : "0",
    "protoVersion" : 0,
    "clientIP" : "String",
    "reqPort" : 0,
    "reqHost" : "String",
    "reqMethod" : "String",
    "reqPath" : "String",
    "reqQuery" : "String",
    "reqContType" : "String",
    "reqContLen" : 0,
    "sslProtocol" : "String",
    "sslVersion" : 0,
    "respStatus" : 0,
    "respRedirURL" : "String",
    "respContType" : "String",
    "respContLen" : 0,
    "respBytesServed" : 0,
    "userAgent" : "String",
    "originHostname" : "String"
  },
  "httpHeaders" : {
    "reqHeader" : {
      "accEnc" : "String",
      "accLang" : "String",
      "auth" : "String",
      "cacheCtl" : "String",
      "connection" : "String",
      "contMD5" : "String",
      "cookie" : "String",
      "DNT" : "String",
      "ifMatch" : "String",
      "ifMod" : "String",
      "ifNoMatch" : "String",
      "pragma" : "String",
      "range" : "String",
      "referer" : "String",
      "TE" : "String",
      "upgrade" : "String",
      "via" : "String",
      "xFrwdFor" : "String",
      "xReqWith" : "String"
    },
    "respHeader" : {
      "cacheCtl" : "String",
      "connection" : "String",
      "contEnc" : "String",
      "contLang" : "String",
      "contLen" : "String",
      "contMD5" : "String",
      "contDisp" : "String",
      "contRange" : "String",
      "date" : "String",
      "eTag" : "String",
      "expires" : "String",
      "lastMod" : "String",
      "p3p" : "String",
      "pragma" : "String",
      "server" : "String",
      "setCookie" : "String",
      "trailer" : "String",
      "transEnc" : "String",
      "vary" : "String",
      "warning" : "String",
      "wwwAuth" : "String"
    }
  },
  "performance" : {
    "reqHeadSize" : 0,
    "reqBodySize" : 0,
    "respHeadSize" : 0,
    "respBodySize" : "String",
    "downloadTime" : "String",
    "originName" : "String",
    "originIP" : "String",
    "originInitIP" : "String",
    "originRetry" : 0,
    "lastMileRTT" : 0,
    "lastMileBW" : 0,
    "netOriginRTT" : 0,
    "cacheStatus" : "String",
    "lastByte" : true,
    "cliCountry" : "String",
    "edgeIP" : "String",
    "reqID" : "String"
  }
}
------------------------------0b1c32056fc5--
Tags (1)
1 Solution

ziegfried
Influencer

To strip the whole HTTP header, the following regex should work:

SEDCMD-stripheader = s/^(?ms)POST.+?(\r?\n){2}//g

And you have to restart splunkd, since that settings is affecting indexing behavior.

View solution in original post

ziegfried
Influencer

To strip the whole HTTP header, the following regex should work:

SEDCMD-stripheader = s/^(?ms)POST.+?(\r?\n){2}//g

And you have to restart splunkd, since that settings is affecting indexing behavior.

beaunewcomb
Communicator

This worked. You rule.

Thanks

0 Karma

chandrasekharko
Path Finder

{"log":"{\"serviceName\":\"xxxxx\",\"ipAddress\":\"\",\"timestamp\":\"2019-02-08T16:06:02.766+0000\",\"traceId\":\"\",\"level\":\"INFO\",\"logger\":\"yyyyyyyApplication\",\"message\":\"Started yyyyyApplication in 23.332 seconds (JVM running for 24.707)\",\"stack\":\"\",\"timeTaken\":\"\"}\n","stream":"stdout","time":"2019-02-08T16:06:02.767236274Z"}

What could be the sedcmd used for this? The problem with this one is the nested log isn't being recognized as a json. I believe the reason is because of \n in the log.

Please correct me if I am wrong and help me on this.

0 Karma

ziegfried
Influencer

So the rest of the event your're seeing is the actual (multipart encoded) HTTP body.

I'd suggest to use another substitution in the SEDCMD to eliminate the multipart boundaries.

eg.

SEDCMD-stripheader = s/^(?ms)POST.+?(r?n){2}//g s/-{30}\V+//g
0 Karma

ziegfried
Influencer

The regex matches "POST" at the beginning of the event up until two CRLF (newlines) are found. \r\n\r\n is the termination of the header in the HTTP protocol.

0 Karma

beaunewcomb
Communicator

Maybe you could break down what this is doing and I can figure it out from there:

SEDCMD-stripheader = s/^(?ms)POST.+?(\r?\n){2}//g

0 Karma

beaunewcomb
Communicator

and the line at the end....

------------------------------0a7d7d9180f4

At least its progress lol

0 Karma

beaunewcomb
Communicator

Wow. Close!

Now I just have to get rid of:
------------------------------0a7d7d9180f4
Content-Disposition: form-data; name="fileupload"; filename="default_connector_schema_1.0.json"
Content-Type: application/octet-stream

0 Karma

ziegfried
Influencer

My last guess is to additionally adjust the line-merging settings (also props.conf):

SHOULD_LINEMERGE=false
LINE_BREAKER=------------------------------\w+--([\r\n]+)
SEDCMD-stripheader = s/(?ms)^POST.+?(\r?\n){2}//g
0 Karma

beaunewcomb
Communicator

Nope. I'm commenting out lines that don't work with #

only one line is active at once... freaking weird.

0 Karma

ziegfried
Influencer

you didn't use both of them at the same time, did you?

0 Karma

beaunewcomb
Communicator

Thanks. No luck. Doesn't seem to affect the event at all.

As noted above, I'm editing the conf, restarting Splunk from the web interface, then logging in and reloading my real-time search.

I know the props file is in the right spot because I can change POST to TEST with

SEDCMD-httpheader = s/(?gism)(POST)/TEST/g
0 Karma

kallu
Communicator

Just guessing but could it be those "\n"s ? You have put sed to line-by-line -mode, so "$" is now end-of-line and I doubt if need that extra newline in your sed command.

SEDCMD-httpheader = s/(?mg)^POST.*$|^User-Agent.*$//g

beaunewcomb
Communicator

Tried it and did nothing noticeable. I'm running real-time searches so make sure I'm getting the latest data.

I edit props.conf, restart splunk, and reload my real-time search page.

I was able to get POST to change to TEST by simply doing this:
SEDCMD-httpheader = s/(?gism)(POST)/TEST/g

0 Karma

dwaddle
SplunkTrust
SplunkTrust

Also, SEDCMD props entries only fire at index time, so they won't affect any previously indexed data.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...