set diff is very slow when match 10 billion
source=/var/log/remote/192.168.1.1.log set diff [search "Built inbound" NOT "8.8.8.8" NOT "8.8.4.4" | rex field=_raw "Outside:(?<destinationip2>\d+.\d+.\d+.\d+){0,3}" | rex field=_raw "Inside:(?<sourceip2>\d+.\d+.\d+.\d+){0,3}"] [search "Built outbound" outsideip=* | rex field=_raw "Outside:(?<destinationip2>\d+.\d+.\d+.\d+){0,3}" | rex field=_raw "Inside:(?<sourceip2>\d+.\d+.\d+.\d+){0,3}"]
format of message:
Aug 3 17:08:58 192.168.3.10 %ASA-6-302013: Built inbound TCP connection 434619881 for Outside:192.168.1.2/50978 (192.168.20.18/590) to Inside:192.168.22.20/443 (192.168.26.5/443)
Aug 3 17:09:15 192.168.3.18 %ASA-6-302013: Built outbound TCP connection 434622811 for Outside:192.168.18/.10/183 (192.168.18.1/1885) to Inside:202.171.21.16/53576 (230.180.220.1/5356)
What are you actually trying to compare? Seems like you're trying to find unique combination of destinationip2 and sourceip2 (not common between those two type of events). Firstly, 10 Billion records are too much for comparison, second, you're not reducing the no of fields to be compared (right now it's comparing all the fields from those two events).
If my understanding is correct (about your requirement) , give this a try
source=/var/log/remote/192.168.1.1.log ("Built inbound" NOT "8.8.8.8" NOT "8.8.4.4") OR ("Built outbound" outsideip=*) | rex field=_raw "Outside:(?<destinationip2>\d+.\d+.\d+.\d+){0,3}" | rex field=_raw "Inside:(?<sourceip2>\d+.\d+.\d+.\d+){0,3}" | eval type=if(match(_raw,"Build inbound"),1,2) | stats sum(type) as type by destinationip2 sourceip2 | where type<3 | table destinationip2 sourceip2
actually i joined with destinationip2 before and succeed, and would like to see the log which are not belonged to inner join
hi, 10billion seems a very huge number. are you sure?.. also did you check the set diff configurations on the limits.conf file ah..please update us..
10 billion !!!... i think, you may need to edit the query timelines and do multiple queries.
with default values for set command, it wont return 10 billion.
http://docs.splunk.com/Documentation/Splunk/6.4.2/SearchReference/Set
Output limitations
There is a limit on the quantity of results that come out of the invoked subsearches that the set command receives to operate on. If this limit is exceeded, the input result set to the diff command is silently truncated.
If you have Splunk Enterprise, you can adjust this limit by editing the limits.conf file and changing the maxout value in the subsearch stanza. If this value is altered, the default quantity of results coming from a variety of subsearch scenarios are altered. Note that very large values might cause extensive stalls during the 'parsing' phase of a search, which is when subsearches run. The default value for this limit is 10000.
Result rows limitations
By default the set command attempts to traverse a maximum of 50000 items from each subsearch. If the number of input results from either search exceeds this limit, the set command silently ignores the remaining events. By default, the maxout setting for subsearches prevents the number of results from exceeding this limit.
If you have Splunk Enterprise, you can change this limit by editing the maxresultrows setting in the set stanza in the limits.conf file.