Splunk Search

Help with rex mode= query- doesn't show error but doesn't show anything at all?

secphilomath1
Explorer

I am trying to eventually get to the point where I can add this to props.conf but am trying out the searches in splunk first to make sure they work.   I was following this example but it wasn't work for me so I backed it up a bit and simplified it

If I run this search, it works and converts all instances abc to def.... 

| rex field=query mode=sed "s/abc/def/"

However, when I do this, it doesn't throw an error but doesn't convert anything, all abc's are still present in the fields..

| rex mode=sed "s/abc/def/"

Been driving me nuts trying to figure out why.

 

 

 

 

 

Labels (1)
0 Karma

secphilomath1
Explorer

What I am trying to do is convert MS DNS Logs to readable text.  I understand that there is probably an app for this but want to do it manually

 

The input data is (3)www(6)google(3)com(0) and I want to change it to www.google.com

I had this working fine - 

| rex field=query mode=sed "s/\(.*?\)/./g s/^\.+(\s+)?// s/\.$//"

It takes all the (#) and converts it to a . and then goes through and removes the first and last .'s 

So I am trying to convert this to a sed command to do this on indexing but can't get it to work, I simplified what I was doing with examples that showed the same behavior.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

OK now this makes sense.  Your actual regex is not simply s/abc/def/, but something like s/^abc/def/.  In regex, "^" and "$" are anchors that do not correspond to actual characters.  Whereas "abc" is anchored at the beginning of the field "query", it may not - and often is not anchored at the beginning of _raw.

Suppose your raw event is

blah blahsomething query="(3)www(6)google(3)com(0)" morestuff

Splunk will give you

_rawquery
blah blahsomething query="(3)www(6)google(3)com(0)" morestuff(3)www(6)google(3)com(0)

In this case,

| rex field=query mode=sed "s/\(.*?\)/./g s/^\.+(\s+)?// s/\.$//"

will give you

_rawquery
blah blahsomething query="(3)www(6)google(3)com(0)" morestuffwww.google.com

but

| rex mode=sed "s/\(.*?\)/./g s/^\.+(\s+)?// s/\.$//"

gives

_rawquery
blah blahsomething query=".www.google.com." morestuff(3)www(6)google(3)com(0)

Does this sound right?

In such cases, you will need to find other ways to anchor your replacements in regex.  In the above example,  "query" in the raw event is bounded by quotation marks.  So, you can use quotation marks as anchor, i.e.,

| rex mode=sed "s/\(.*?\)/./g s/\"\.+(\s+)?/\"/ s/\.\"/\"/"

 Of course, depending on actual raw events, /\(.*?\)/ could be way too broad, and quotation marks could be used in other fields that may legitimately begin or end with a dot.  So, this might be a safer choice:

| rex mode=sed "s/\"\(\d+\){1,}(\s+)?/\"/ s/\(\d+\)\"/\"/ s/\(\d+\)/./g"

 

0 Karma

secphilomath1
Explorer

When I try the two samples provided;

| rex mode=sed "s/\(.*?\)/./g s/\"\.+(\s+)?/\"/ s/\.\"/\"/"

and 

| rex mode=sed "s/\"\(\d+\){1,}(\s+)?/\"/ s/\(\d+\)\"/\"/ s/\(\d+\)/./g"

 They run without error but don't actually modify the output.  Similar to what I was seeing earlier.  

I really appreciate your help with this

 

Tags (1)
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Can you share more of raw data than just (3)www(6)google(3)com(0)?

0 Karma

secphilomath1
Explorer

Here are a few more examples;

 

(3)www(6)google(2)ca(0)

(7)outlook(9)office365(3)com(0)

(7)updates(4)asdf(3)com(0)

(4)test(4)test(3)com(0)

 

 

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@secphilomath1 wrote:

Here are a few more examples;

(3)www(6)google(2)ca(0)

(7)outlook(9)office365(3)com(0)

(7)updates(4)asdf(3)com(0)

(4)test(4)test(3)com(0)


This is not what meant by more details of raw events because all of these can pass the original regex.  I want to see what is surrounding the RAW events, not just query field.  In other word, it is critical to know the boundary before the first "." and the last ".".  Without knowing that, volunteers are just wasting time speculating.

It is impossible that an entire raw event only contains a single string "(7)updates(4)asdf(3)com(0)". (Otherwise your original regex should have succeeded.)  Is this correct?

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Also, you said somewhere earlier that you want to do this "on indexing". So what's the real issue here?

0 Karma

secphilomath1
Explorer

Ok, I am an idiot and apologize, I am building my experience in Splunk still.  I was outputting the results to a table but when I went to look at the raw data I see that the following is actually working!

 

index=wineventlog eventtype="msad-dns-debuglog"

| rex mode=sed "s/\(.*?\)/./g s/^\.+(\s+)?// s/\.$//"

I am getting .www.google.com in the raw data which is a lot closer than I thought I was.  I am unsure why I am still getting that leading dot, but this is something.  

you are right, I want to catch this in indexing but wanted to verify my sed logic was accurate before I did that.

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

index=wineventlog eventtype="msad-dns-debuglog"

| rex mode=sed "s/\(.*?\)/./g s/^\.+(\s+)?// s/\.$//"

I am getting .www.google.com in the raw data which is a lot closer than I thought I was.  I am unsure


You are still not illustrating what is in the raw event.  This result only suggests that

  1. the targeted string (e.g., "(3)www(6)google(3)com(0)") is at the end of the line in the raw event (thus positive on s/\.$//);
  2. there is some other string before the target string in the raw event (thus negative for s/^\.+(\s+)?//); and
  3. the character immediately before the target string is not a quotation mark as I used to illustrate my point about anchor in regex.

If there is some guarantee that 1 is always true in eventtype mdad-dns-debuglog, it would be fine to anchor your regex against $.  But you have to show us what that leading anchor can possibly be.  By the way, using elimination of \. AFTER substitution, whether leading or trailing, is a very risky strategy because you could easily be altering parts of the raw string you don't want to alter.  It is much safer to be explicit about those "(3)", etc.

If you want to be as generic as possible but minimize the risk of undesirable alterations, this is perhaps the best approach:

| rex mode=sed "s/(\W+)\(\d+\)/\1/ s/\(\d+\)$// s/\(\d+\)(\W)/\1/ s/(\w)\(\d+\)(\w)/\1.\2/g"

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

If you expect the rex command to substitute one string for another in raw event and thus make Splunk extract all the field values from an event modified that way - it won't work. Why should it?

Splunk extracts fields automatically as needed at the beginning of the pipeline. When you modify the _raw field it's just a field - yes, it's a default field for many commands but it's just a field. So you might modify _raw with rex or any other command but it won't change the extracted fields.

Per analogiam - if you do

index=whatever
| fields *
| eval _raw=""

You should expect to see all your original fields extracted even though at some point you've overwritten the _raw field with empty string.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

This means that the data that populates the field "query" at search time is absent from _raw events.  For example, "query" could come from an automatic lookup.  Or it could be a calculated field.  And so on.

This test can help you diagnose:

| where match(_raw, "abc")

If this returns any event, and the rex mode=sed command still doesn't take effect, you have discovered a bug.

Another useful test would be

| rex field=query mode=sed "s/abc/def/" ``` you indicate that this successfully changes abc in query to def ```
| where match(_raw, "abc")

This is the same expectation: you should get no event because the prior sed doesn't change _raw field.

0 Karma

secphilomath1
Explorer

Would this count as a calculated field, this is all I see in the props.conf currently for this particular field.

 

FIELDALIAS-query = questionname AS query

0 Karma

yuanliu
SplunkTrust
SplunkTrust

That is a field alias, not calculated field.  Based on this information, I assume that questionname is in raw events.  Do you see any event with questionname and "abc"?  I understand the need to anonymize data.  But you need to describe your data characteristics accurately.  What is the data format?  Key-value pair? JSON?  XML? Freehand?  Given a snippet of raw event, how is Splunk supposed to know how to populate questionname?

Also, does the test query return any events?

0 Karma

yeahnah
Motivator

Hi @secphilomath1 

Without seeing the original event it is hard to know for certain but I suspect that you simply need to add the global (g) field to the sed command.  Without it only the first match will be switched.

For example...

| makeresults
| eval _raw="dummy event: abc query=abc"
| rex mode=sed "s/abc/def/"

Result: "dummy event: def query=abc"

| makeresults
| eval _raw="dummy event abc query=abc"
| rex mode=sed "s/abc/def/g"

Result: "dummy event: def query=def"

0 Karma

secphilomath1
Explorer

Ok, using the original data, here is a result that works.....

| makeresults
| eval _raw="(3)www(6)google(2)ca(0)"

| rex mode=sed "s/\(.*?\)/./g s/^\.+(\s+)?// s/\.$//"

 

I get 
www.google.ca

 

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...