Solved: Putting Calculations in .conf Files

David · ‎04-19-2011

I have a csv file input that is based on a data sampling method (takes the per-second average for a counter and records the result every 10 minutes), and needs to be multiplied by 600 to get the real number. For example, if I get a value for hits of 5.39, that means we actually had 3243 hits during that 10 minutes (5.39 * 600 = 3423).

Right now, when I do searches off the raw data, I have a very long macrothat multiplies each of the 12 counters by 600 for every event | eval hits=hits*600 | eval misses=misses*600... I'd like to move that to a .conf file, so it doesn't need to be done every time.

I have one possible solution (detailed below), but it's not perfect. Are there any other viable options?

One Solution (with a problem):

Paolo suggested that I try a scripted lookup to solve the problem. This worked, though it slowed the search down 25% (compared with 8% for the evals alone). That is perfectly reasonable for my needs, so it's not a problem. The other downside is that it seems you can't overwrite the original field. In effect, you can't do hits=hits*600, but you can do myhits=hits*600. You also can't fieldalias it afterward, or use OUTPUT myhits AS hits, or the lookup will balk. That's not ideal in general, but because I happen to be renaming the fields anyway, it meets my needs.

To implement, I put the following in props.conf:

[MySourceType]
Lookup-LookupField1 = LookupField1 Field1 OUTPUT MyField1
Lookup-LookupField2 = LookupField2 Field2 OUTPUT MyField2
...
Lookup-LookupField15 = LookupField15 Field15 OUTPUT MyField15

And in Transforms:

[LookupField1]
external_cmd = MultiplyAll.py Field1 MyField1
external_type = python
fields_list = Field1, MyField1

...

[LookupField15]
external_cmd = MultiplyAll.py Field15 MyField15
external_type = python
fields_list = Field15, MyField15

And then add MultiplyAll.py to your app's bin dir (this is probably filled with extra text -- I'm new to Python, so I grabbed most from external_lookup.py):

import sys,os,csv

def main():
    if len(sys.argv) != 3:
        print "Usage: python MultiplyAll.py [original field] [new field]"
        sys.exit(0)

    origf = sys.argv[1]
    newf = sys.argv[2]
    r = csv.reader(sys.stdin)
    w = None
    header = []
    first = True

    for line in r:
        if first:
            header = line
            if origf not in header or newf not in header:
                print "Original and New fields must exist in CSV data"
                sys.exit(0)
            csv.writer(sys.stdout).writerow(header)
            w = csv.DictWriter(sys.stdout, header)
            first = False
            continue

        # Read the result
        result = {}
        i = 0
        while i < len(header):
            if i < len(line):
                result[header[i]] = line[i]
            else:
                result[header[i]] = ''
            i += 1

        # Do the math
        if len(result[origf]) and len(result[newf]):
            w.writerow(result)

        elif len(result[origf]):
            result[newf] = float(result[origf]) * 600
            w.writerow(result)

        elif len(result[newf]):
            pass

main()

Paolo_Prigione · ‎04-19-2011

Even if you coded this into props.conf, the evaluation would still run at search time so I see no benefit there. But that's not possible anyway 🙂

There are, however, alternatives:

If you want to report on long periods of time, you might wish to use summary indexing to pre-compute every sort of metric...
If you have these data show up in dashboards, you could use scheduled searches to feed the dashboards so that splunk won't have to run a search every time somedody loads the page. A walkthrough here.

I'm short of other options...

View solution in original post

Paolo_Prigione · ‎04-19-2011

Even if you coded this into props.conf, the evaluation would still run at search time so I see no benefit there. But that's not possible anyway 🙂

There are, however, alternatives:

If you want to report on long periods of time, you might wish to use summary indexing to pre-compute every sort of metric...
If you have these data show up in dashboards, you could use scheduled searches to feed the dashboards so that splunk won't have to run a search every time somedody loads the page. A walkthrough here.

I'm short of other options...

David · ‎04-26-2011

What might be more problematic for some users (but not for me) is that you apparently can't use a lookup to overwrite the original field. I tried a few variations. If I had the actual lookup replace the same field that it was looking up, it just errored out, saying the lookup didn't exist. If I tried to fieldalias it afterward, it would just totally ignore the fieldalias (making me think it reads in the props.conf and then executes all fieldaliases before lookups, maybe?). That said, coincidentally I'm renaming all the fields in props.conf anyway, so this actually works perfectly for me.

David · ‎04-26-2011

Okay. So the results are in. Performance wise, it's slower, as expected. In what is either interesting or coincidental, it is exactly the same speed as putting the evals after the stats command (per my other request). That means that it slows the search by 25% instead of by 8%, which, while not ideal, isn't the end of the world either. So that's decent.

David · ‎04-19-2011

I'll look into that, and let you know the result. BTW, on the performance impact side: doing 1.9 million evals results in a ~8 second delay, although there are some oddities along the way: http://splunk-base.splunk.com/answers/22655/why-is-eval-so-slow-after-stats

Paolo_Prigione · ‎04-19-2011

No, I don't. But might be worth the test.
Was it a search command, you could make it "streaming", so that it executes while the search is still running. Don't know how lookups are managed.

David · ‎04-19-2011

Hmm. That's an interesting notion. Do you know if there is a performance overhead, or any surprises with a large number of passed values? I'm thinking there could pretty easily be something like 200,000 multiplications, and maybe a lot more if a user was doing a ridiculously long search.

Paolo_Prigione · ‎04-19-2011

Why not a scripted python lookup, then? Would be included automatically for that sourcetype (or source) and (I think) it might overwrite the destination fields themselves. It would totally go unnoticed.

David · ‎04-19-2011

I'm not as concerned about performance -- the search is performant enough for my needs at this point. My greater concern is making it easier for non-tech users to do searches on their own of the raw data without having to add in the eval macro. (And to reduce the chance of error in making reports.) I definitely use summary indexing and all that jazz. I'm looking for a way to automate fixing those values.

David · ‎04-19-2011

Bonus points for any opinion on performance implications. I've been running on the knowledge that evals are pretty cheap, and that the queries I'm running aren't particularly complex, but if it's going to traverse 90,000 records for a search, that's going to 12*90,000 evals...

Putting Calculations in .conf Files

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!