When I use the "diff" search command to compare events that contain several hundred lines, I notice that differences located at "the bottom" of the event (after about 500 lines of event content) are not picked up.
Why is that?
Is there a way to circumvent this limitation?
When used from the Search app, the diff search command calls a Python script located in $SPLUNK_HOME/etc/apps/search/bin/diff.py which uses the difflib Python library.
As it ships with Splunk, this script truncates its input at 9,000 characters :
# less $SPLUNK_HOME/apps/search/bin/diff.py
# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0
import sys,splunk.Intersplunk
import difflib,time
import splunk.mining.dcutils as dcu
logger = dcu.getLogger()
## COMPARE TWO RESULTS
## ARGS [pos1 pos2] [attribute to compare]
##
## DEFAULTS = 1 2 _raw
##
(...)
maxlen = 9000
(...)
if len(val1) > maxlen or len(val2) > maxlen:
# cut text off at maxlen
val1 = val1[:maxlen]
val2 = val2[:maxlen]
(...)
This limitation is explained in the Search Reference Manual as being roughly equivalent to 500 lines :
http://www.splunk.com/base/Documentation/latest/SearchReference/Diff#Description
You can change the value of "maxlen" to enable diff.py to compare larger events. The best method to do this would be to make a copy of diff.py (so as to make sure your version isn't overwritten during a Splunk upgrade), increase the value of "maxlen" according to your needs and declare your new search command in commands.conf to replace "diff".
Example :
[diff]
filename = mydiff.py
supports_getinfo = true
enableheader = false
retainsevents = true
changes_colorder = false
overrides_timeorder = true
When invoking "diff" in the Search app, your comparisons will now be limited to 12,000 characters per event instead of 9,000.
CAUTION : This cap was set in order to prevent the "diff" command from consuming excessive amounts of memory if for example you feed it tens of thousands of very long events. Be aware that you increase this limit at your system's resources risk!
I'd make a copy of diff and add a new option in maxlines, so that you can tweak it as you need to. You can add a new option as follows.
# poor mans opt
for a in sys.argv[1:]:
if a.startswith("maxlen="):
where = a.find('=')
maxlen = a[where+1:len(a)]
When used from the Search app, the diff search command calls a Python script located in $SPLUNK_HOME/etc/apps/search/bin/diff.py which uses the difflib Python library.
As it ships with Splunk, this script truncates its input at 9,000 characters :
# less $SPLUNK_HOME/apps/search/bin/diff.py
# Copyright (C) 2005-2010 Splunk Inc. All Rights Reserved. Version 4.0
import sys,splunk.Intersplunk
import difflib,time
import splunk.mining.dcutils as dcu
logger = dcu.getLogger()
## COMPARE TWO RESULTS
## ARGS [pos1 pos2] [attribute to compare]
##
## DEFAULTS = 1 2 _raw
##
(...)
maxlen = 9000
(...)
if len(val1) > maxlen or len(val2) > maxlen:
# cut text off at maxlen
val1 = val1[:maxlen]
val2 = val2[:maxlen]
(...)
This limitation is explained in the Search Reference Manual as being roughly equivalent to 500 lines :
http://www.splunk.com/base/Documentation/latest/SearchReference/Diff#Description
You can change the value of "maxlen" to enable diff.py to compare larger events. The best method to do this would be to make a copy of diff.py (so as to make sure your version isn't overwritten during a Splunk upgrade), increase the value of "maxlen" according to your needs and declare your new search command in commands.conf to replace "diff".
Example :
[diff]
filename = mydiff.py
supports_getinfo = true
enableheader = false
retainsevents = true
changes_colorder = false
overrides_timeorder = true
When invoking "diff" in the Search app, your comparisons will now be limited to 12,000 characters per event instead of 9,000.
CAUTION : This cap was set in order to prevent the "diff" command from consuming excessive amounts of memory if for example you feed it tens of thousands of very long events. Be aware that you increase this limit at your system's resources risk!