Getting Data In

set diff is very slow when match 10 billion

cyberportnoc
Explorer

set diff is very slow when match 10 billion

source=/var/log/remote/192.168.1.1.log set diff [search "Built inbound" NOT "8.8.8.8" NOT "8.8.4.4" | rex field=_raw "Outside:(?<destinationip2>\d+.\d+.\d+.\d+){0,3}"  | rex field=_raw "Inside:(?<sourceip2>\d+.\d+.\d+.\d+){0,3}"] [search "Built outbound" outsideip=* | rex field=_raw "Outside:(?<destinationip2>\d+.\d+.\d+.\d+){0,3}" | rex field=_raw "Inside:(?<sourceip2>\d+.\d+.\d+.\d+){0,3}"]

format of message:

Aug  3 17:08:58 192.168.3.10 %ASA-6-302013: Built inbound TCP connection 434619881 for Outside:192.168.1.2/50978 (192.168.20.18/590) to Inside:192.168.22.20/443 (192.168.26.5/443)

Aug  3 17:09:15 192.168.3.18 %ASA-6-302013: Built outbound TCP connection 434622811 for Outside:192.168.18/.10/183 (192.168.18.1/1885) to Inside:202.171.21.16/53576 (230.180.220.1/5356)
Tags (1)
0 Karma

somesoni2
Revered Legend

What are you actually trying to compare? Seems like you're trying to find unique combination of destinationip2 and sourceip2 (not common between those two type of events). Firstly, 10 Billion records are too much for comparison, second, you're not reducing the no of fields to be compared (right now it's comparing all the fields from those two events).

If my understanding is correct (about your requirement) , give this a try

 source=/var/log/remote/192.168.1.1.log  ("Built inbound" NOT "8.8.8.8" NOT "8.8.4.4")  OR ("Built outbound" outsideip=*) | rex field=_raw "Outside:(?<destinationip2>\d+.\d+.\d+.\d+){0,3}" | rex field=_raw "Inside:(?<sourceip2>\d+.\d+.\d+.\d+){0,3}" | eval type=if(match(_raw,"Build inbound"),1,2) | stats sum(type) as type by destinationip2 sourceip2 | where type<3 | table destinationip2 sourceip2
0 Karma

cyberportnoc
Explorer

actually i joined with destinationip2 before and succeed, and would like to see the log which are not belonged to inner join

0 Karma

inventsekar
SplunkTrust
SplunkTrust

hi, 10billion seems a very huge number. are you sure?.. also did you check the set diff configurations on the limits.conf file ah..please update us..

0 Karma

inventsekar
SplunkTrust
SplunkTrust

10 billion !!!... i think, you may need to edit the query timelines and do multiple queries.

with default values for set command, it wont return 10 billion.
http://docs.splunk.com/Documentation/Splunk/6.4.2/SearchReference/Set

Output limitations
There is a limit on the quantity of results that come out of the invoked subsearches that the set command receives to operate on. If this limit is exceeded, the input result set to the diff command is silently truncated.

If you have Splunk Enterprise, you can adjust this limit by editing the limits.conf file and changing the maxout value in the subsearch stanza. If this value is altered, the default quantity of results coming from a variety of subsearch scenarios are altered. Note that very large values might cause extensive stalls during the 'parsing' phase of a search, which is when subsearches run. The default value for this limit is 10000.

Result rows limitations
By default the set command attempts to traverse a maximum of 50000 items from each subsearch. If the number of input results from either search exceeds this limit, the set command silently ignores the remaining events. By default, the maxout setting for subsearches prevents the number of results from exceeding this limit.

If you have Splunk Enterprise, you can change this limit by editing the maxresultrows setting in the set stanza in the limits.conf file.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...