Splunk Enterprise

How to handle search query when JSON data has host field?

evbtbw92
New Member

I'm working on a corporate Splunk instance where we do not have access to rename fields when indexing, or make any similar modifications due to security and compliance requirements.
I'm trying to create a timechart based on the number of events per hour by host. My issue is that the JSON data has a host field in addition to the Splunk built in host field.
IE a sample event looks like:
{"time":"2019-04-05T21:50:09.925Z","severity":"INFO","duration":25.02,"db":10.23,"view":14.79,"status":200,"method":"GET","path":"/api/v4/project/1","params":[],"host":"my_server_1","ip":"1.2.3.4, 4.5.6.7","ua":null,"route":"/api/:version/projects/:id","user_id":12,"username":"smithers","queue_duration":4.35,"magic_calls":0}

My search looks like:
index="my_index" host="prd-srv-00*" source="/var/log/my_program/http_json*" | timechart span=1h count by host

When I do this it combines the hosts the logs came from (built in host field) and hosts listed in the data (host field in the json).

If I try to filter out the hosts from the data, it removes the events from the built in host field as well. IE:
index="my_index" host="prd-srv-00*" AND host !="0.0.0.0" source="/var/log/my_program/http_json*" | timechart span=1h count by host
I have also tried to use ...| where host !="0.0.0.0" | ... but this has the same result.

Any advice on a solution or workaround to handle this at search time? IE, can I rename to column when searching, etc?

Thanks in advance for any help.

Labels (1)
0 Karma

evbtbw92
New Member

After fighting with this of a day and a half I finally found a solution that will work. It has a bit more hard coding with the server names than I wanted, but good enough for now:
index="my_index" host::prd-srv-001 OR host::prd-srv-002 OR host::prd-srv-003 OR host::prd-srv-004 source="/var/log/my_program/http_json*" | timechart usenull=f useother=f cont=true span=1m count by host | fillnull value=0 | table _time, prd-srv-001, prd-srv-002, prd-srv-003, prd-srv-004

As noted in my previous comments, the rex mode=sed stuff didn't work when I included a timechart.

0 Karma

woodcock
Esteemed Legend

Be sure to click Accept on your answer and UpVote any other answers or comments that helped you get to this.

0 Karma

woodcock
Esteemed Legend

Here are some options.

1: To ensure searching by the indexed host, you can use :: syntax like index=foo host::bar.
2: To ensure that the automatic KV_MODE extraction does not happen, polluting your host value, run your search in Fast mode.
3: To keep the KV_MODE extractions, but not the host one, add this to your search: ... | rex mode=sed "s/\"host\"/\"json_host\"/g"

I think #3 is your ticket.

0 Karma

evbtbw92
New Member

Thanks for the response, this seems close, but isn't exactly what I need.

When I try this I have:
index="my_index" host=prd-srv-00* | rex mode=sed "s/\"host\"/\"json_host\"/g" | timechart span=1m count by host | fillnull value=0
This returns the correct number of events under the Events tab, it still isn't right, see the data below, you can see the event shows it is updated with json_host instead of host, but if you look at the bottom, it appears the rex mode change doesn't take affect until after the search:
{"time":"2019-04-05T21:50:09.925Z","severity":"INFO","duration":25.02,"db":10.23,"view":14.79,"status":200,"method":"GET","path":"/api/v4/project/1","params":[],"host":"my_server_1","ip":"1.2.3.4, 4.5.6.7","ua":null,"route":"/api/:version/projects/:id","user_id":12,"username":"smithers","queue_duration":4.35,"magic_calls":0}
Show syntax highlighted
host = prd-srv-008 host = my_server_1 method = GET

So I guess at this point I"m looking for one of two things:
1. Do the rex mode before the search, like I said now it appears to happen after
2. In the Visualization hide a data series (my_server_1 in this case)
I prefer number 1 as it will require fewer manual updates in the future.

I'm continuing to play with this and do some searches, but so far I haven't had any luck.

Thanks again for any help!

0 Karma

evbtbw92
New Member

Something interesting I found, if you are not trying to visualize the data it works as woodcock expliaed in number three, IE: index="my_index" host=prd-srv-00* | rex mode=sed "s/\"host\"/\"json_host\"/g"

But as soon as you add the timechart, either in the query or by right clicking on host it adds the host field from the data in. Very odd.

Any help is much appreciated!

0 Karma

niketn
Legend

@evbtbw92 could you explain the corporate requirement for not changing field name when there can be two two different values for host field where both are valid values? I think this is incorrect requirement. Is there any one host value that you need or both?

If you need only the host value from JSON data, you should correct host metadata while indexing, so that searches work faster.

If you need both the values, you should have one of the fields as host and other as something else, maybe Host, since Splunk field names are case-sensitive (using transforms.conf).

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

harsmarvania57
Ultra Champion

Hi,

You can use regex to extract host from actual raw data in new field, try below query which will extract hostname from raw data into new field called ext_host.

Based on sample event you have provided, below query will extract my_server_1 in ext_host field.

<yourBaseSearcg>
| rex field=_raw "\"host\"\:\"(?<ext_host>[^\"]*)"

So your query will be like this

index="my_index" host="prd-srv-00*"  source="/var/log/my_program/http_json*"
| rex field=_raw "\"host\"\:\"(?<ext_host>[^\"]*)"
| search ext_host!="my_server_1"

ddrillic
Ultra Champion
0 Karma

evbtbw92
New Member

Thank you very much for the reply! I put more details in the other answer (I had to pick one), but neither seems to work for me.

When I tried your approach:
index="my_index" host=prd-srv-00* AND ext_host!="my_server_1" | rex field=_raw "\"host\"\:\"(?<ext_host>[^\"]*)" | timechart span=1m count by host | fillnull value=0
It didn't return any values. This seems to be due to the rex field update not happening until after the search.
Any other thoughts?

Thanks again for any help!

0 Karma

harsmarvania57
Ultra Champion

Ah yes totally forgot that, you need to search after rex.

 index="my_index" host="prd-srv-00*" source="/var/log/my_program/http_json*"
| rex field=_raw "\"host\"\:\"(?<ext_host>[^\"]*)"
| search ext_host!="my_server_1" 
0 Karma

evbtbw92
New Member

That still returns no results. My guess is because ext_host is still set for each event, therefore when the search happens it is excluding all events.

0 Karma

harsmarvania57
Ultra Champion

So how many events do you have which does not contain my_server_1 in your raw data ? If you are playing with only 1 sample event which you have provided then it will not return any result because you are searching for ext_host does not equal to my_server_1

0 Karma

evbtbw92
New Member

I'm looking at a small number of events for testing, about 400. Some have my_server_1 and some have my_server_2, so it would return values.

The problem I have is that all the events come from the host (Splunk side) of prd-srv-008, but also have the other field set.
In my graph, for the first minute it is showing:
prd-srv-008 --COUNT: 432
my_server_1 --COUNT: 320
my_server_2 --COUNT: 112

if you notice my_server_1 and my_server_2 always adds up to the total from prd-srv-008

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...