Splunk Search

Parsing a field in csv having key value pairs and displaying it as header and value in table

swarjs
Explorer

We are trying to parse an entry in the csv which is of the below format,
2018-01-11 00:00:00,default.MS_2016,shekhar.com,Forward-Mapping,No,,10.102.31.216,1,0,2,0,0,0,1,0,0,0,0,1,0,0,1,1,0,1,0,8,0,0,hinf=10|cert=15|ipsec=80|

The value in the last column needs to be parsed and it should be displayed as separate columns and corresponding value.
Example: For the entry "hinf=10|cert=15|ipsec=80|"
The Table needs to display,
hinf cert ipsec
10 15 80

Could anyone please let me know if this is possible.

0 Karma

mayurr98
Super Champion

hey @swarjs

Try this run anywhere search

| makeresults | eval _raw="2018-01-11 00:00:00,default.MS_2016,shekhar.com,Forward-Mapping,No,,10.102.31.216,1,0,2,0,0,0,1,0,0,0,0,1,0,0,1,1,0,1,0,8,0,0,hinf=10|cert=15|ipsec=80|" | rex field=_raw "hinf=(?P<hinf>[^\|]+)\|cert=(?P<cert>[^\|]+)\|ipsec=(?P<ipsec>[^\|]+)" | table hinf cert ipsec 

In order to make it work go to Settings » Fields » Field extractions » Add new
put Name ,sourcetype and regex as hinf=(?P<hinf>[^\|]+)\|cert=(?P<cert>[^\|]+)\|ipsec=(?P<ipsec>[^\|]+)
save and search for your data and see the fields extracted

Let me know if this helps!

0 Karma

swarjs
Explorer

Thanks for the quick response!!
Sorry I probably missed out on adding more details in my question.

As a standalone search, your query is working fine. But in our scenario the last value in the CSV is a dynamic list of key-value pairs. So there is no fixed keyword and no fixed number of entries.

Example: The last column in the CSV entry can be any of the following,
hinfo=10|cert=15|ipsec=80|
AAAA=10|PTR=15|ipsec=80|A=4|CNAME=30|
LOC=2|

Following is the existing query for the report,
sourcetype=ib:dns:zone index=ib_ipam (view="*") * * * | msservers ms_primary | dedup view,zone_name | noop | rename view as View, zone_name as Zone, zone_format as Function, signed as Signed, hosts as Hosts, rr_total as "Total Records", rr_a as "A Records", rr_aaaa as "AAAA Records", rr_cname as "CNAME Records", rr_dhcid as "DHCID Records", rr_dname as "DNAME Records", rr_dnskey as "DNSKEY Records", rr_ds as "DS Records", rr_mx as "MX Records", rr_naptr as "NAPTR Records", rr_nsec as "NSEC Records", rr_nsec3param as "NSEC3PARAM Records", rr_nsec3 as "NSEC3 Records", rr_ns as "NS Records", rr_ptr as "PTR Records", rr_rrsig as "RRSIG Records", rr_soa as "SOA Records", rr_srv as "SRV Records", rr_tlsa as "TLSA Records", rr_txt as "TXT Records", rr_other as "Other Records", rr_lbdn as "LBDN" | eval Timestamp = strftime(_time, "%Y-%m-%d %H:%M:%S %Z") | table Timestamp, Zone, Function, Signed, Hosts, "LBDN", "Total Records", "A Records", "AAAA Records", "CNAME Records", "DHCID Records", "DNAME Records", "DNSKEY Records", "DS Records", "MX Records", "NAPTR Records", "NSEC Records", "NSEC3PARAM Records", "NSEC3 Records", "NS Records", "PTR Records", "RRSIG Records", "SOA Records", "SRV Records", "TLSA Records", "TXT Records", "Other Records"

In the query above, rr_aaaa, rr_a, rr_soa etc are the headers of the csv.

This needs enhancement to support the new column added in the CSV(rr_unknown = "hinfo=10|cert=15|ipsec=80|").

0 Karma

mayurr98
Super Champion

hey so you can write

| rex field=rr_unknown "hinf=(?P<hinf>[^\|]+)" | rex field=rr_unknown  "cert=(?P<cert>[^\|]+)" | rex field=rr_unknown "ipsec=(?P<ipsec>[^\|]+)" | table hinf cert ipsec

OR

| extract pairdelim="|", kvdelim="="

0 Karma

swarjs
Explorer

But the values that are populated in the last column are dynamic. So there is no fixed format for the key value pair and also there is no fixed count of the key-value pairs that would occur in this column. So using the above "rex" probably would not work since it expects particular field keys.

Could you please let me know how to write a generic version of the regex, where in the last column will be parsed based on key-value pairs. Is there a way that we could specify the "extract" call to work on a particular field, say "rr_unknown".

I am new to splunk queries, so please do not mind if my question sounds trivial.

0 Karma

mayurr98
Super Champion

can you give me sample input? with the exact field name and values?

0 Karma

swarjs
Explorer

CSV content:

timestamp,view,zone_name,zone_format,signed,grid_primary,ms_primary,rr_a,......rr_other,rr_total,hosts,rr_lbdn,rr_dhcid,rr_unknown
2018-01-12 00:00:00,default.MS_2016,shekhar.com,Forward-Mapping,No,,10.102.31.216,1,0,2,0,0,0,1,0,0,0,0,1,0,0,1,1,0,1,0,8,0,0,0,|hinfo=10|cert=20|loc=30|abc=67|
2018-01-12 00:00:00,default.MS_2016,zone.com,Forward-Mapping,No,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,|caa=89|cname=90|aaaa=100|

In the above CSV entries, the last column("rr_unknown"), could contain the key value pairs.
this column needs to be further parsed and each entry of that parsing entry needs to be displayed as a separate column in the table.

Example: The table display should look as follows,
Timestamp Zone 'A Record' 'AAAA Record' .. .. .. .. .. 'DHCID Record' 'CAA Record' 'CNAME Record' 'AAAA Record'
2018-01-12 00:00:00 zone.com 0 0 .. .. .. .. .. 0 89 90 100

The query that I am currently using is as follows,
sourcetype=ib:dns:zone index=ib_ipam (view="") | msservers ms_primary | dedup view,zone_name | noop | rename view as View, zone_name as Zone, zone_format as Function, signed as Signed, hosts as Hosts, rr_total as "Total Records", rr_a as "A Records", rr_aaaa as "AAAA Records", rr_cname as "CNAME Records", rr_dhcid as "DHCID Records", rr_dname as "DNAME Records", rr_dnskey as "DNSKEY Records", rr_ds as "DS Records", rr_mx as "MX Records", rr_naptr as "NAPTR Records", rr_nsec as "NSEC Records", rr_nsec3param as "NSEC3PARAM Records", rr_nsec3 as "NSEC3 Records", rr_ns as "NS Records", rr_ptr as "PTR Records", rr_rrsig as "RRSIG Records", rr_soa as "SOA Records", rr_srv as "SRV Records", rr_tlsa as "TLSA Records", rr_txt as "TXT Records", rr_other as "Other Records", rr_lbdn as "LBDN" | eval Timestamp = strftime(_time, "%Y-%m-%d %H:%M:%S %Z") | table Timestamp, Zone, Function, Signed, Hosts, "LBDN", "Total Records", "A Records", "AAAA Records", "CNAME Records", "DHCID Records", "DNAME Records", "DNSKEY Records", "DS Records", "MX Records", "NAPTR Records", "NSEC Records", "NSEC3PARAM Records", "NSEC3 Records", "NS Records", "PTR Records", "RRSIG Records", "SOA Records", "SRV Records", "TLSA Records", "TXT Records", "Other Records"

This query needs to be enhanced to accommodate the new column "rr_unknown" and display the values under this column as individual columns.

0 Karma

p_gurav
Champion

Hi swarjs,

After indexing is "hinf=10|cert=15|ipsec=80|" is coming in one field?

0 Karma

swarjs
Explorer

Yes correct. After extraction, "hinf=10|cert=15|ipsec=80|" will be in one field. Need to further process this and display it as separate columns in a table.

0 Karma

p_gurav
Champion

Then you can use field extractor to extract values , Or use rex command "| rex field=(replace field name where it is being captured) "hinf=(?P[^|]+)|cert=(?P[^|]+)|ipsec=(?P[^|]+)""

0 Karma

swarjs
Explorer

Hi p_gurav,
Thanks for the inputs. Sorry I have updated more details about my query below. One of the concern is that the last column of my csv is dynamic. ie. there is no fixed pattern for the key value pairs. Only the delimiters are fixed.

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...