Splunk Search

Why is the "diff" search command not reliable for large events containing several hundred lines?

hexx
Splunk Employee
Splunk Employee

When I use the "diff" search command to compare events that contain several hundred lines, I notice that differences located at "the bottom" of the event (after about 500 lines of event content) are not picked up.

Why is that?

Is there a way to circumvent this limitation?

Tags (3)
1 Solution

hexx
Splunk Employee
Splunk Employee

When used from the Search app, the diff search command calls a Python script located in $SPLUNK_HOME/etc/apps/search/bin/diff.py which uses the difflib Python library.

As it ships with Splunk, this script truncates its input at 9,000 characters :


# less $SPLUNK_HOME/apps/search/bin/diff.py

# Copyright (C) 2005-2010 Splunk Inc.  All Rights Reserved.  Version 4.0
import sys,splunk.Intersplunk
import difflib,time
import splunk.mining.dcutils as dcu

logger = dcu.getLogger()

##  COMPARE TWO RESULTS
##  ARGS [pos1 pos2] [attribute to compare]
##
##  DEFAULTS = 1 2 _raw
##

(...)

maxlen = 9000

(...)

  if len(val1) > maxlen or len(val2) > maxlen:
      # cut text off at maxlen
      val1 = val1[:maxlen]
      val2 = val2[:maxlen]

(...)

This limitation is explained in the Search Reference Manual as being roughly equivalent to 500 lines :

http://www.splunk.com/base/Documentation/latest/SearchReference/Diff#Description

You can change the value of "maxlen" to enable diff.py to compare larger events. The best method to do this would be to make a copy of diff.py (so as to make sure your version isn't overwritten during a Splunk upgrade), increase the value of "maxlen" according to your needs and declare your new search command in commands.conf to replace "diff".

Example :

  • cp $SPLUNK_HOME/etc/app/search/bin/diff.py $SPLUNK_HOME/etc/app/search/bin/mydiff.py
  • vi $SPLUNK_HOME/etc/app/search/bin/mydiff.py
  • change "maxlen" to 12000 on line (for example)
  • vi $SPLUNK_HOME/etc/app/search/local/commands.conf
  • add the following stanza :

[diff]
filename = mydiff.py
supports_getinfo = true
enableheader = false
retainsevents = true
changes_colorder = false
overrides_timeorder = true

When invoking "diff" in the Search app, your comparisons will now be limited to 12,000 characters per event instead of 9,000.

CAUTION : This cap was set in order to prevent the "diff" command from consuming excessive amounts of memory if for example you feed it tens of thousands of very long events. Be aware that you increase this limit at your system's resources risk!

View solution in original post

Marinus
Communicator

I'd make a copy of diff and add a new option in maxlines, so that you can tweak it as you need to. You can add a new option as follows.

# poor mans opt
for a in sys.argv[1:]:

    if a.startswith("maxlen="):
        where = a.find('=')
        maxlen = a[where+1:len(a)]

hexx
Splunk Employee
Splunk Employee

When used from the Search app, the diff search command calls a Python script located in $SPLUNK_HOME/etc/apps/search/bin/diff.py which uses the difflib Python library.

As it ships with Splunk, this script truncates its input at 9,000 characters :


# less $SPLUNK_HOME/apps/search/bin/diff.py

# Copyright (C) 2005-2010 Splunk Inc.  All Rights Reserved.  Version 4.0
import sys,splunk.Intersplunk
import difflib,time
import splunk.mining.dcutils as dcu

logger = dcu.getLogger()

##  COMPARE TWO RESULTS
##  ARGS [pos1 pos2] [attribute to compare]
##
##  DEFAULTS = 1 2 _raw
##

(...)

maxlen = 9000

(...)

  if len(val1) > maxlen or len(val2) > maxlen:
      # cut text off at maxlen
      val1 = val1[:maxlen]
      val2 = val2[:maxlen]

(...)

This limitation is explained in the Search Reference Manual as being roughly equivalent to 500 lines :

http://www.splunk.com/base/Documentation/latest/SearchReference/Diff#Description

You can change the value of "maxlen" to enable diff.py to compare larger events. The best method to do this would be to make a copy of diff.py (so as to make sure your version isn't overwritten during a Splunk upgrade), increase the value of "maxlen" according to your needs and declare your new search command in commands.conf to replace "diff".

Example :

  • cp $SPLUNK_HOME/etc/app/search/bin/diff.py $SPLUNK_HOME/etc/app/search/bin/mydiff.py
  • vi $SPLUNK_HOME/etc/app/search/bin/mydiff.py
  • change "maxlen" to 12000 on line (for example)
  • vi $SPLUNK_HOME/etc/app/search/local/commands.conf
  • add the following stanza :

[diff]
filename = mydiff.py
supports_getinfo = true
enableheader = false
retainsevents = true
changes_colorder = false
overrides_timeorder = true

When invoking "diff" in the Search app, your comparisons will now be limited to 12,000 characters per event instead of 9,000.

CAUTION : This cap was set in order to prevent the "diff" command from consuming excessive amounts of memory if for example you feed it tens of thousands of very long events. Be aware that you increase this limit at your system's resources risk!

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...