All Apps and Add-ons

Splunk R App uses default stringsAsFactors=TRUE

brian_from_fl
Explorer

First, thanks for changing the read.csv call to leave the _time field name unmodified in version 0.3.10.

I have found cases where the input data frame's string factors cause problems, and have not yet found a case where I actually needed the strings as factors. I don't know if it's worth it to change the read.csv call to add stringsAsFactors=FALSE, but here's what I do to remove them and convert them back to character strings, in case anyone else wishes to know:

factor_fields <- sapply(input, is.factor)
input[factor_fields] <- lapply(input[factor_fields], as.character)

As a bonus, here is the contents of a simple R script (let's call it subset.r) that removes the Splunk fields that aren't terribly useful (such as raw and the date fields). The table command can be used to select a subset of fields, but you must know your subset; this script automatically determines a subset that reduces the size of the resulting CSV export by about one-third:

originalNames <- names(input)

remove_names <- originalNames[c(grep("^date", originalNames),
                                grep("^_", originalNames))]

remove_names <- c(remove_names,
                  "splunk_server", "splunk_server_group",
                  "Label", "linecount", "punct", "eventtype")

subsetNames <- c("_time",
                 originalNames[!(originalNames %in% remove_names)])

output <- subset(input, select=subsetNames)

Then whenever I need to export Splunk results as CSV (for example, to use R to perform a beautiful scatter plot using ggplot2), I just add the following to the end of my Splunk query:

| r subset.r

Bottom line: Would you agree to update your R app to add the stringsAsFactors=FALSE argument to the read.csv call in the R app?

Ideally, I'd like an R app that would just be connected to the pipeline's stdout and require me to call read.csv, and then each of my R scripts could set the specific options it needed and avoid post-processing re-form the input back to what the script requires.

Just a thought. Thanks for your consideration.

Tags (3)
0 Karma

rfujara_splunk
Splunk Employee
Splunk Employee

I just created a new version of the R app that uses stringsAsFactors=FALSE for the read.csv call.

Plus, I updated the app's setup page allowing you to customize the read.csv and write.csv function call options.

I haven't uploaded the new version to apps.aplunk.con yet, because I want you to test it first. I uploaded the package here.

0 Karma

brian_from_fl
Explorer

Thanks! I have installed your update (version 0.3.11) and verified that it works, both with my existing scripts, and then with my updated scripts.

I noticed your default stringsAsFactors=FALSE option in the r.conf file, and left it as-is.

I left the write.csv options empty, but it's nice to know there's a place to set them if and when a change is required, all with the existing R app.

Thank you again! We love it!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...