All Apps and Add-ons

Potential bug in R Analytics App

jedatt01
Builder

Hi guys at Itility, I attended your session at .conf 2016. I've been playing around with your R app and am seeing that frequently when using the runRdo custom command that I get inconsistent results coming back from R in Splunk. Example below.

The search below occasionally comes back with the correct results and populates splunk with the test data frame. However, more often than not it comes back with a Null error.

| inputlookup iris.csv | runRdo script="set.seed(1); my_iris = dataset[-5]; species = dataset$species; kmeans_iris = kmeans(my_iris,3); kmeans_table = table(kmeans_iris$cluster,species); test = as.data.frame(kmeans_table); return(test);"


#error results
message                                                      session         status
NA/NaN/Inf in foreign function call (arg 1) In call: do_one(nmeth)  0           400


#correct results
Var1    Freq    species
1      50   Iris Setosa
2      0    Iris Setosa
3      0    Iris Setosa
1      0    Iris Versicolor
2      2    Iris Versicolor
3      48   Iris Versicolor
1      0    Iris Virginica
2      36   Iris Virginica
3      14   Iris Virginica

Please let me know what you think.

0 Karma
1 Solution

gwobben
Communicator

Thanks for using the R app! (and attending our presentation)

There are a couple of things you need to take into account:
1. Splunk is not consistent in the order of the columns (even when using table or fields commands). This means that dataset[-5] will not give you a consistent column. We haven't found a workaround yet, however, you can use column names in R.
2. Splunk in not aware of any data types and will always send out strings (even when it's obvious that your data is numeric). Our app will try to parse the data as numeric but when it fails R will receive chars instead of numerics. It's most safe to cast data types in R explicitly.

When debugging, you can use the parameter getResults=false which will give you a link to the console output by R. When using the str() command in R the console will show the data types.

So back to your query. This example should work (works on my machine):

| inputlookup iris.csv 
| runRdo script="
    # Fix the random seed
    set.seed(1);

    # Store the dataset in a variable
    my_iris = dataset;

    # Seperate the species column from the rest
    species = as.factor(my_iris$species);
    my_iris = my_iris[ , !(names(my_iris) %in% c('species'))];

    # Cast data types
    my_iris$petal_length = as.numeric(my_iris$petal_length);
    my_iris$sepal_length = as.numeric(my_iris$sepal_length);
    my_iris$petal_width = as.numeric(my_iris$petal_width);
    my_iris$sepal_width = as.numeric(my_iris$sepal_width);

    # Show summaries in the console, use getResults=false to see the link to the console
    str(species);
    str(my_iris);

    # Perform the kmeans
    kmeans_iris = kmeans(my_iris, 3);
    kmeans_table = table(kmeans_iris$cluster, species);

    # Return a dataframe
    return(as.data.frame(kmeans_table));" getResults=t

I hope this fixes your issue! We'd love to hear how your using our app so stay in touch!

View solution in original post

gwobben
Communicator

Thanks for using the R app! (and attending our presentation)

There are a couple of things you need to take into account:
1. Splunk is not consistent in the order of the columns (even when using table or fields commands). This means that dataset[-5] will not give you a consistent column. We haven't found a workaround yet, however, you can use column names in R.
2. Splunk in not aware of any data types and will always send out strings (even when it's obvious that your data is numeric). Our app will try to parse the data as numeric but when it fails R will receive chars instead of numerics. It's most safe to cast data types in R explicitly.

When debugging, you can use the parameter getResults=false which will give you a link to the console output by R. When using the str() command in R the console will show the data types.

So back to your query. This example should work (works on my machine):

| inputlookup iris.csv 
| runRdo script="
    # Fix the random seed
    set.seed(1);

    # Store the dataset in a variable
    my_iris = dataset;

    # Seperate the species column from the rest
    species = as.factor(my_iris$species);
    my_iris = my_iris[ , !(names(my_iris) %in% c('species'))];

    # Cast data types
    my_iris$petal_length = as.numeric(my_iris$petal_length);
    my_iris$sepal_length = as.numeric(my_iris$sepal_length);
    my_iris$petal_width = as.numeric(my_iris$petal_width);
    my_iris$sepal_width = as.numeric(my_iris$sepal_width);

    # Show summaries in the console, use getResults=false to see the link to the console
    str(species);
    str(my_iris);

    # Perform the kmeans
    kmeans_iris = kmeans(my_iris, 3);
    kmeans_table = table(kmeans_iris$cluster, species);

    # Return a dataframe
    return(as.data.frame(kmeans_table));" getResults=t

I hope this fixes your issue! We'd love to hear how your using our app so stay in touch!

jedatt01
Builder

Thank you this works perfectly! I can see now that the column order changes if I run the search multiple times. I will avoid using index references from now on and make sure to cast my data types as well.

0 Karma

gwobben
Communicator

Glad to hear it worked! I've added this question (and answer) to Splunkbase: https://splunkbase.splunk.com/app/3339/#/details

0 Karma

splunk47
New Member

Hi .. for me nothing is getting printed after clicking the run button in script editor
not even error is coming ..
is opencpu mandatory for this ? and can we isntall it in the same machine as splunk server ?

please respond ASAP

0 Karma

gwobben
Communicator

Yes, OpenCPU is mandatory, and Yes, you can install it on the same machine. Good luck!

0 Karma

splunk47
New Member

public.opencpu.org would not work ?

actually we dont have right to install opencpu as of now.
so thought to use some public opencpu

0 Karma

gwobben
Communicator

Sure, that should work. Just be absolutely sure you're willing to send your data and your algorithm to some unfamiliar host and be aware that you cannot use libraries that are not installed on the OpenCPU server that you're using.

0 Karma

splunk47
New Member

but nothing is coming when i am clicking the run button in splunk app
R console tab is hidden only . at least some error should come
ps i am using public.opencpu.org only

0 Karma

splunk47
New Member

External search command 'runrpairs' returned error code 1. Script output = "error_message=ConnectionError at "/data/splunk_axpclp/lib/python2.7/site-packages/requests/adapters.py", line 375 : HTTPSConnectionPool(host='public.opencpu.org', port=443): Max retries exceeded with url: /ocpu/library/base/R/identity (Caused by : [Errno -2] Name or service not known) "

0 Karma

gwobben
Communicator

I've just tried public.opencpu.org on my own machine and it's working just fine.. Please make sure that your Splunk machine is able to connect to public.opencpu.org (no network issues / firewalls) and make sure that your configuration includes https as protocol (go to apps -> manage apps -> click on setup next to the R Analytics app -> fill out "https://public.opencpu.org" and click save).

If this doesn't work, please share your setup.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...