All Apps and Add-ons

Why is the map command, in the jellyfisher app, neither returning all results nor erroring out?

teresachila
Path Finder

I am using the jellyfisher app and I want to calculate the jaro winkler distance between a list of words (1150 unique values, comparing each combination), which means it should return 1150x1150=1,322,500 results. I have the query as follow using the map command:

| inputlookup wordlist.csv | rename word as sw1  
| map [|inputlookup wordlist.csv  | rename word as sw2 | eval sw3="$sw1$" | jellyfisher jaro_winkler(sw2,sw3) | eval sid=$_serial_id$] maxsearches=1150

It is working except that it only returns the first 50600 results (1150x44). The sid being displayed only showed up to 44. There is no error in the search log. However, in the search log, I can see it was evaluating up to sid=1150, but it is not showing in the results. I couldn't find any error log. Is there some configuration that is restricting this?
Thanks!

0 Karma
1 Solution

DalJeanis
Legend

First, as a general case, subsearches are limited to 50K records. you can check your limits.conf stanza called [subsearch] for the exact details.

In this case, since it sounds like you may be getting results which are going away, you might be running into some other limitation on how many Meg of results you are allowed to keep, or how long the system will run your search before timing out.

Consider sending each individual run to a separate file with something like this inside your map

... | outputcsv myoutput.$_serial_id$.csv | where false()  ...

The first will output the records to a csv, the second will delete the data stream so that any overall record limits are not encountered. it won't help you with time limits, but you can deal with that any number of ways.

View solution in original post

DalJeanis
Legend

First, as a general case, subsearches are limited to 50K records. you can check your limits.conf stanza called [subsearch] for the exact details.

In this case, since it sounds like you may be getting results which are going away, you might be running into some other limitation on how many Meg of results you are allowed to keep, or how long the system will run your search before timing out.

Consider sending each individual run to a separate file with something like this inside your map

... | outputcsv myoutput.$_serial_id$.csv | where false()  ...

The first will output the records to a csv, the second will delete the data stream so that any overall record limits are not encountered. it won't help you with time limits, but you can deal with that any number of ways.

teresachila
Path Finder

Thanks! I can see all the files on the search head and each file has the correct number of lines. No time limit issue. I still don't get the answer from the UI but at least this is a workaround. Thanks!

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...