Solved: SA-Eventgen: Auto extract fields from csv header o...

guilmxm · ‎07-18-2015

Hi,

For the future version of NMON Splunk App (https://splunkbase.splunk.com/app/1753/) i want to provide data samples and eventgen configuration to allow people testing the App without having to deploy real client or manage real Nmon data.

With this goal, i am getting some success in generating data from my sample exports using Eventgen, my problem resides on fields extraction.

Under normal circumstances, and for the Performance data sourcetype, fields are being extracted automatically using the csv header generated by third party conversion scripts:

[nmon_data]

FIELD_DELIMITER=,
FIELD_QUOTE="
HEADER_FIELD_LINE_NUMBER=1
INDEXED_EXTRACTIONS=csv
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TIMESTAMP_FIELDS=ZZZZ
TIME_FORMAT=%d-%m-%Y %H:%M:%S
KV_MODE=none
pulldown_type=true

Using the following Eventgen configuration, i can get data in by only Splunk defaults fields (like host) will be correctly extracted:

[LPAR_nmon_sample]
interval = 15
earliest = -15s
latest = now
count = 100
hourOfDayRate = { "0": 0.8, "1": 1.0, "2": 0.9, "3": 0.7, "4": 0.5, "5": 0.4, "6": 0.4, "7": 0.4, "8": 0.4, "9": 0.4, "10": 0.4, "11": 0.4, "12": 0.4, "13": 0.4, "14": 0.4, "15": 0.4, "16": 0.4, "17": 0.4, "18": 0.4, "19": 0.4, "20": 0.4, "21": 0.4, "22": 0.5, "23": 0.6 }
dayOfWeekRate = { "0": 0.7, "1": 0.7, "2": 0.7, "3": 0.5, "4": 0.5, "5": 1.0, "6": 1.0 }
randomizeCount = 0.2
randomizeEvents = true

mode = sample
sampletype = csv

fileName = LPAR_nmon_sample.csv
outputMode = splunkstream

index=nmon
host=sample.splunk.com
source=sample
sourcetype=nmon_data

# Host/User/pass only necessary if running outside of splunk!
splunkHost = localhost
splunkUser = admin
splunkPass = admin

## Replace timestamp
token.0.token = \d*-\d*-\d{4}\s\d{2}:\d{2}:\d{2}
token.0.replacementType = timestamp
token.0.replacement = %d-%m-%Y %H:%M:%S

The data sample looks like the original raw data generated by the App, with an header and data rows (tabular csv):

index,host,sourcetype,source,type,serialnum,hostname,ZZZZ,interval,snapshots,PhysicalCPU,virtualCPUs,logicalCPUs,poolCPUs,entitled,weight,PoolIdle,"usedAllCPU_PCT","usedPoolCPU_PCT",SharedCPU,Capped,"EC_User_PCT","EC_Sys_PCT","EC_Wait_PCT","EC_Idle_PCT","VP_User_PCT","VP_Sys_PCT","VP_Wait_PCT","VP_Idle_PCT",Folded,"Pool_id","_indextime","_raw","_serial","_sourcetype","_time"
nmon,AAAAAAA,"nmon_data","/media/BIGDATA/splunk/var/run/nmon/var/csv_repository/AAAAAAA_26_MAR_2015_000003_LPAR_1283471_20150502154354.nmon.csv",LPAR,XXXXXXXX,AAAAAAA,"26-03-2015 00:00:43",240,359,"0.000",6,12,64,"2.50",128,"0.00","0.00","0.00",1,0,"0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00",0,,1430574239,"LPAR,XXXXXXXX,AAAAAAA,26-03-2015 00:00:43,240,359,0

With the configuration above, data get in correctly but only the host field will be extracted. (the raw event will contain all the data)

Also tried an other configuration with Eventgen that will generate data files being managed by the App standard input:

[LPAR_nmon_sample]
interval = 15
earliest = -15s
latest = now
count = 100
hourOfDayRate = { "0": 0.8, "1": 1.0, "2": 0.9, "3": 0.7, "4": 0.5, "5": 0.4, "6": 0.4, "7": 0.4, "8": 0.4, "9": 0.4, "10": 0.4, "11": 0.4, "12": 0.4, "13": 0.4, "14": 0.4, "15": 0.4, "16": 0.4, "17": 0.4, "18": 0.4, "19": 0.4, "20": 0.4, "21": 0.4, "22": 0.5, "23": 0.6 }
dayOfWeekRate = { "0": 0.7, "1": 0.7, "2": 0.7, "3": 0.5, "4": 0.5, "5": 1.0, "6": 1.0 }
randomizeCount = 0.2
randomizeEvents = true

mode = sample
sampletype = csv

fileName = LPAR_nmon_sample.csv

outputMode = spool
spoolDir = $SPLUNK_HOME/var/run/nmon/var/csv_repository

## Replace timestamp
token.0.token = \d*-\d*-\d{4}\s\d{2}:\d{2}:\d{2}
token.0.replacementType = timestamp
token.0.replacement = %d-%m-%Y %H:%M:%S

In the case, the data is being managed like if it was raw data, but as Eventgen won't generate files with the csh header, fields won't be correctly extracted.

So the question is: Would there be a way for Eventgen to extract fields from the csv data sample header the same it does for the host fields ?

I have played with token with no success, as far as i understood this would manage data in existing fields but no create new fields ?

Setting extraction at search time would not be an answer, in the App context the number, order and names of fields may vary between Operating System generating the data, this is perfectly managed by csv header extraction, this would not be the case with static fields definition or rex extraction.

Thank you very much in advance for any help, Eventgen is very powerful and providing it within the Nmon App would be very helpful for people wanting to test the App in simulation mode.

Regards,

Guilhem

guilmxm · ‎07-23-2015

I finally found a working solution, not perfect but working.

I use some source:: stanza in props.conf to intercept sources related to sample files, set related fields in transforms.conf and i use spool mode in Eventgen to generate files that will managed by standard App inputs.

Anyone interested by theses settings can access to it starting with 1.6.0 of the App.

I would have preferred an approach that would use csv header to extract fields, but currently at least it doesn't seem to be possible.

Guilhem

View solution in original post

guilmxm · ‎07-23-2015

I finally found a working solution, not perfect but working.

I use some source:: stanza in props.conf to intercept sources related to sample files, set related fields in transforms.conf and i use spool mode in Eventgen to generate files that will managed by standard App inputs.

Anyone interested by theses settings can access to it starting with 1.6.0 of the App.

I would have preferred an approach that would use csv header to extract fields, but currently at least it doesn't seem to be possible.

Guilhem

SA-Eventgen: Auto extract fields from csv header of data sample

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!