Split Indexing from JSON

mtaylor78 · ‎04-07-2017

So we have are pulling host logs on an EC2 instance and dropping them in an S3 Bucket. Our Splunk Heavy Forwarder is grabbing the logs and pushing them to Splunk Cloud. As we can easily pull the JSON log files into Splunk, there is one specific field inside the JSON document that we want to extract and push to another index.

{"message":"type=NETFILTER_CFG msg=audit(1491557469.510:18655): table=nat family=2 entries=146","@version":"1","@timestamp":"2017-04-07T09:31:09.880Z","path":"/systemlogs/audit/audit.log","host":"generichost","tags":["auditd"],"aws_region":"us-east-1","prod_status":"lalaland","aws_az":"us-east-1a","kluster_name":"klustername","kluster_prefix":"klust-pre","service_name":"audit","service_log":"audit"}

What I am trying to do is to extract only the "message" and ship it another index called aws_auditd

"message":"type=NETFILTER_CFG msg=audit(1491557469.510:18655): table=nat family=2 entries=146"

The regex that I have found for it is:

(\"message\":)\"([^\,}]*)\"

but I have no idea how to perform the function. Can someone advise me?

woodcock · ‎04-08-2017

You need 2 pieces. First you need to strip out everything but the message field. Do that like this:

In props.conf on your HF:

[your sourcetype here]
SEDCMD-fix_message_prefix = s/^{\"message\":\"/message=/
SEDCMD-strip_non_message_suffix = s/\",\".*$//"
TRANSFORMS-set= setnull,setparsing

Then you need to conditionally keep only this part of the JSON stream; in transforms.conf on your HF:

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[setparsing]
REGEX = ^message=
DEST_KEY = queue
FORMAT = indexQueue

somesoni2 · ‎04-07-2017

So want complete json to be in one index and just the message into another index? I don't believe there is an option at index time to send same event or portion of a single event to two indexes. My suggestion would to have the data fully ingested in one index and then setup a summary indexing search to filter only the relevant portion and index it to other index. Generally summary indexing is used for optimal searching/reporting but it's design helps in this kind of use-cases as well (save processed data to a different index). See below link for more details on summary indexing.
http://docs.splunk.com/Documentation/Splunk/6.5.2/Knowledge/Usesummaryindexing

Split Indexing from JSON

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!