Getting Data In

Putting Calculations in .conf Files

David
Splunk Employee
Splunk Employee

I have a csv file input that is based on a data sampling method (takes the per-second average for a counter and records the result every 10 minutes), and needs to be multiplied by 600 to get the real number. For example, if I get a value for hits of 5.39, that means we actually had 3243 hits during that 10 minutes (5.39 * 600 = 3423).

Right now, when I do searches off the raw data, I have a very long macrothat multiplies each of the 12 counters by 600 for every event | eval hits=hits*600 | eval misses=misses*600... I'd like to move that to a .conf file, so it doesn't need to be done every time.




I have one possible solution (detailed below), but it's not perfect. Are there any other viable options?




One Solution (with a problem):

Paolo suggested that I try a scripted lookup to solve the problem. This worked, though it slowed the search down 25% (compared with 8% for the evals alone). That is perfectly reasonable for my needs, so it's not a problem. The other downside is that it seems you can't overwrite the original field. In effect, you can't do hits=hits*600, but you can do myhits=hits*600. You also can't fieldalias it afterward, or use OUTPUT myhits AS hits, or the lookup will balk. That's not ideal in general, but because I happen to be renaming the fields anyway, it meets my needs.

To implement, I put the following in props.conf:

[MySourceType]
Lookup-LookupField1 = LookupField1 Field1 OUTPUT MyField1
Lookup-LookupField2 = LookupField2 Field2 OUTPUT MyField2
...
Lookup-LookupField15 = LookupField15 Field15 OUTPUT MyField15

And in Transforms:

[LookupField1]
external_cmd = MultiplyAll.py Field1 MyField1
external_type = python
fields_list = Field1, MyField1

...

[LookupField15]
external_cmd = MultiplyAll.py Field15 MyField15
external_type = python
fields_list = Field15, MyField15

And then add MultiplyAll.py to your app's bin dir (this is probably filled with extra text -- I'm new to Python, so I grabbed most from external_lookup.py):

import sys,os,csv

def main():
    if len(sys.argv) != 3:
        print "Usage: python MultiplyAll.py [original field] [new field]"
        sys.exit(0)

    origf = sys.argv[1]
    newf = sys.argv[2]
    r = csv.reader(sys.stdin)
    w = None
    header = []
    first = True

    for line in r:
        if first:
            header = line
            if origf not in header or newf not in header:
                print "Original and New fields must exist in CSV data"
                sys.exit(0)
            csv.writer(sys.stdout).writerow(header)
            w = csv.DictWriter(sys.stdout, header)
            first = False
            continue

        # Read the result
        result = {}
        i = 0
        while i < len(header):
            if i < len(line):
                result[header[i]] = line[i]
            else:
                result[header[i]] = ''
            i += 1

        # Do the math
        if len(result[origf]) and len(result[newf]):
            w.writerow(result)

        elif len(result[origf]):
            result[newf] = float(result[origf]) * 600
            w.writerow(result)

        elif len(result[newf]):
            pass

main()
Tags (1)
1 Solution

Paolo_Prigione
Builder

Even if you coded this into props.conf, the evaluation would still run at search time so I see no benefit there. But that's not possible anyway 🙂

There are, however, alternatives:

  • If you want to report on long periods of time, you might wish to use summary indexing to pre-compute every sort of metric...
  • If you have these data show up in dashboards, you could use scheduled searches to feed the dashboards so that splunk won't have to run a search every time somedody loads the page. A walkthrough here.

I'm short of other options...

View solution in original post

Paolo_Prigione
Builder

Even if you coded this into props.conf, the evaluation would still run at search time so I see no benefit there. But that's not possible anyway 🙂

There are, however, alternatives:

  • If you want to report on long periods of time, you might wish to use summary indexing to pre-compute every sort of metric...
  • If you have these data show up in dashboards, you could use scheduled searches to feed the dashboards so that splunk won't have to run a search every time somedody loads the page. A walkthrough here.

I'm short of other options...

David
Splunk Employee
Splunk Employee

What might be more problematic for some users (but not for me) is that you apparently can't use a lookup to overwrite the original field. I tried a few variations. If I had the actual lookup replace the same field that it was looking up, it just errored out, saying the lookup didn't exist. If I tried to fieldalias it afterward, it would just totally ignore the fieldalias (making me think it reads in the props.conf and then executes all fieldaliases before lookups, maybe?). That said, coincidentally I'm renaming all the fields in props.conf anyway, so this actually works perfectly for me.

0 Karma

David
Splunk Employee
Splunk Employee

Okay. So the results are in. Performance wise, it's slower, as expected. In what is either interesting or coincidental, it is exactly the same speed as putting the evals after the stats command (per my other request). That means that it slows the search by 25% instead of by 8%, which, while not ideal, isn't the end of the world either. So that's decent.

0 Karma

David
Splunk Employee
Splunk Employee

I'll look into that, and let you know the result. BTW, on the performance impact side: doing 1.9 million evals results in a ~8 second delay, although there are some oddities along the way: http://splunk-base.splunk.com/answers/22655/why-is-eval-so-slow-after-stats

0 Karma

Paolo_Prigione
Builder

No, I don't. But might be worth the test.
Was it a search command, you could make it "streaming", so that it executes while the search is still running. Don't know how lookups are managed.

0 Karma

David
Splunk Employee
Splunk Employee

Hmm. That's an interesting notion. Do you know if there is a performance overhead, or any surprises with a large number of passed values? I'm thinking there could pretty easily be something like 200,000 multiplications, and maybe a lot more if a user was doing a ridiculously long search.

0 Karma

Paolo_Prigione
Builder

Why not a scripted python lookup, then? Would be included automatically for that sourcetype (or source) and (I think) it might overwrite the destination fields themselves. It would totally go unnoticed.

David
Splunk Employee
Splunk Employee

I'm not as concerned about performance -- the search is performant enough for my needs at this point. My greater concern is making it easier for non-tech users to do searches on their own of the raw data without having to add in the eval macro. (And to reduce the chance of error in making reports.) I definitely use summary indexing and all that jazz. I'm looking for a way to automate fixing those values.

0 Karma

David
Splunk Employee
Splunk Employee

Bonus points for any opinion on performance implications. I've been running on the knowledge that evals are pretty cheap, and that the queries I'm running aren't particularly complex, but if it's going to traverse 90,000 records for a search, that's going to 12*90,000 evals...

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...