Splunk Search

rex and transforms.conf not giving the same result

jvl_netic
Engager

I am trying to parse MySQL slowlogs and get the query extract from the log.

I have the following format in the log:


# Time: 120316 9:27:20
# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
# Query_time: 6.848644 Lock_time: 0.000022 Rows_sent: 0 Rows_examined: 1
SET timestamp=xxx;
SELECT * FROM DUAL;
# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
# Query_time: 12.537060 Lock_time: 0.000035 Rows_sent: 0 Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;
# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
# Query_time: 16.935841 Lock_time: 0.000046 Rows_sent: 0 Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;

And when I try to do a search time extract of a field I use

sourcetype=mysql-slow | rex "(?<aaa>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*(;$))" | top 50 aaa

When I would get this output:


1: SELECT * FROM DUAL;
2: SELECT * FROM DUAL;
3: SELECT * FROM DUAL;

This is what I want, but then I put this in my transforms.conf


[query_for_mysql_slow]
REGEX = (?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*(;$))

Now when I try the search

sourcetype=mysql-slow | top 50 query

I get


SELECT * FROM DUAL; # User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx] # Query_time: 12.537060 Lock_time: 0.000035 Rows_sent: 0 Rows_examined: 0 SET timestamp=xxx; SELECT * FROM DUAL; # User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx] # Query_time: 16.935841 Lock_time: 0.000046 Rows_sent: 0 Rows_examined: 0 SET timestamp=xxx; SELECT * FROM DUAL;

I do not understand why it does not work when put into transforms.conf

Any ideas?

Tags (2)
0 Karma
1 Solution

kristian_kolb
Ultra Champion

Well there are few possible explanations:

If you just copied the transforms stuff from the actual file, there is a typo. It's REGEX, not REXEX. If you just want to make a search time extraction of the field, you should probably try something like;

in props.conf

[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*;)$

There should be no configuration in transforms.conf.

There are other things that are a bit unclear regarding what you want to achieve, and how you try to do that, but try this first.


UPDATE:

Fixed the factual error regarding regex in transforms - sorry about that.

Does Splunk parse the sample log as one or three events? Do you really want three events? I'm not 100% sure how splunk would handle events #2 and #3 correctly, since there is no timestamp. Or would it do to have a single event with a multivalued field for the query?

On a side note, I believe the reason for the bad results is that the regex might be too greedy. Try adding a '?' after .*

[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$

This however will only produce one result, even if there are more than one occurence.

So either break the log into separate events and hope that timestamping will work, and use the regex above. Or keep the log as one event and use transforms.conf like this;

in props.conf

REPORT-slowq = slow_queries

in transforms.conf

[slow_queries]
REGEX = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$
MV_ADD = true

Hope this helps,

Kristian

EDIT: factual errors, my bad

View solution in original post

kristian_kolb
Ultra Champion

Well there are few possible explanations:

If you just copied the transforms stuff from the actual file, there is a typo. It's REGEX, not REXEX. If you just want to make a search time extraction of the field, you should probably try something like;

in props.conf

[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*;)$

There should be no configuration in transforms.conf.

There are other things that are a bit unclear regarding what you want to achieve, and how you try to do that, but try this first.


UPDATE:

Fixed the factual error regarding regex in transforms - sorry about that.

Does Splunk parse the sample log as one or three events? Do you really want three events? I'm not 100% sure how splunk would handle events #2 and #3 correctly, since there is no timestamp. Or would it do to have a single event with a multivalued field for the query?

On a side note, I believe the reason for the bad results is that the regex might be too greedy. Try adding a '?' after .*

[mysql-slow]
EXTRACT-slow_query = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$

This however will only produce one result, even if there are more than one occurence.

So either break the log into separate events and hope that timestamping will work, and use the regex above. Or keep the log as one event and use transforms.conf like this;

in props.conf

REPORT-slowq = slow_queries

in transforms.conf

[slow_queries]
REGEX = ^(?<query>(DELETE|SELECT|INSERT|ALTER|GRANT|REPLACE).*?;)$
MV_ADD = true

Hope this helps,

Kristian

EDIT: factual errors, my bad

jvl_netic
Engager

Thanks Kristian,

The REXEX is a typo...

From the documentation http://docs.splunk.com/Documentation/Splunk/latest/admin/transformsconf the REGEX can can be specified like that in trancforms.conf

* REGEX and the FORMAT attribute:
    * Name-capturing groups in the REGEX are extracted directly to fields. This means that you
      do not need to specify the FORMAT attribute for simple field extraction cases (see the 
      description of FORMAT, below).
    * If the REGEX extracts both the field name and its corresponding field value, you can use 
      the following special capturing groups if you want to skip specifying the mapping in 
      FORMAT: 
      _KEY_<string>, _VAL_<string>. 
    * For example, the following are equivalent:
        * Using FORMAT:
            * REGEX  = ([a-z]+)=([a-z]+)
            * FORMAT = $1::$2
        * Without using FORMAT
            * REGEX  = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
    * When using either of the above formats, in a search-time extraction, the
      regex will continue to match against the source text, extracting as many
      fields as can be identified in the source text.

I tried to put in the EXTRACT in props.conf, but it gives the same result as putting it in transforms.conf.

I want to achieve to be able to extract each query as a field and also I want to use this regex to split after this line with the query in.

So in the sample I make I want the following result:

Extract the field query from each line

1: SELECT * FROM DUAL;
2: SELECT * FROM DUAL;
3: SELECT * FROM DUAL;

and I want to use MUST_NOT_BREAK_AFTER = regex to make sure the sample is brokken up in 3 events:

Event 1:

\# Time: 120316  9:27:20
\# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
\# Query_time: 6.848644  Lock_time: 0.000022 Rows_sent: 0  Rows_examined: 1
SET timestamp=xxx;
SELECT * FROM DUAL;

Event 2:

\# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
\# Query_time: 12.537060  Lock_time: 0.000035 Rows_sent: 0  Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;

Event 3:

\# User@Host: xxx[xxx] @ xxx.xxx.xxx.dk [xxx.xxx.xxx.xxx]
\# Query_time: 16.935841  Lock_time: 0.000046 Rows_sent: 0  Rows_examined: 0
SET timestamp=xxx;
SELECT * FROM DUAL;

0 Karma

kristian_kolb
Ultra Champion

see updated answer /k

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...