I have several lines which look like :
2018-10-05 15:10:00.000, STEP="STEP1", VALUE="1965.00000", ZONE="CITY1", CODE="GOOD", DATE="201810051510"
I want to predict some fields with those lines. But first, I need to pre-process my data.
That's why I want to avg all the VALUE by every 5 minutes window and by STEP, ZONE, CODE.
I really don't know how to do this.
Here is an illustration. I want to transform something like :
2018-10-05 15:10:01.000, STEP="STEP1", VALUE=**"1965.00000"**, ZONE="CITY1", CODE="GOOD", DATE="201810051510"
2018-10-05 15:10:30.000, STEP="STEP1", VALUE=**"1546.00000"**, ZONE="CITY1", CODE="GOOD", DATE="201810051510"
2018-10-05 15:11:21.000, STEP="STEP1", VALUE=**"1452.00000"**, ZONE="CITY1", CODE="GOOD", DATE="201810051510"
2018-10-05 15:13:54.000, STEP="STEP1", VALUE=**"985.00000"**, ZONE="CITY1", CODE="GOOD", DATE="201810051510"
In one unique line per 5 minutes, STEP, ZONE, CODE something like :
2018-10-05 15:15:00.000, STEP="STEP1", VALUE=**"1487.00000"**, ZONE="CITY1", CODE="GOOD", DATE="201810051510"
Can you help me ?!
Try something like this:
index="yourindex" sourcetype="yoursourcetype"
| rex field=_raw "VALUE=\*\*\"(?<VAL>\d+\.\d+)\"\*\*"
| bin span=5m _time
| stats avg(VAL) as AVGVAL by _time STEP ZONE CODE
| eval AVGVAL=round(AVGVAL,2)
I had to do the rex because Splunk was auto-extracting **"1965.00000"** instead of just 1965.00000 for the VALUE field. If you want 5 decimal places, just change the 2 in the last line to 5. Here is a screenshot of the results of this search: