Getting Data In

How do I count how many times a value of one field appears in other fields in another CSV/lookup?

russell120
Communicator

Hi, consider the below CSV files:

quickscan.csv

      ip             mac
 11.11.11.11     ab:cd:ef:gh
 22.22.22.222    ij:jk:lm:no
 33.33.333.333   pq:rs:tu:vw

ScanRecord.csv

 unknown_ip    dangerous_ip      date      location    unknown_mac     blocked_mac 
22.22.22.222   22.22.22.222    20181025    Hogwarts    ij:jk:lm:no     ij:jk:lm:no
33.33.333.333                  20181025       US       pq:rs:tu:vw

How do I take the values of the ip and mac fields and compare them against the second CSV to display the number of (specified) fields that appear withinScanRecord.csv (the fields would be unknown_ip, dangerous_ip, unknown_mac, and blocked_mac)? The result should spit out a table that looks like this:

     ip           ip_appears        mac        mac_appears
22.22.22.222          2         ij:jk:lm:no         2
33.33.333.333         1         pq:rs:tu:vw         1

EDIT: The solution needs to be scale-able and not limited to the 2-3 rows/events I have above. In reality, each of these CSVs has hundreds of rows/events.

0 Karma

DalJeanis
Legend

Here's one method...

| inputcsv ScanRecord.csv

| rename COMMENT as "this section calculates the number of times that any IP or mac appears in the ScanRecord.csv"
| eval testfields=mvappend(unknown_ip,dangerous_ip,unknown_mac,blocked_mac)
| stats count as foundcount by testfields
| eval mac=case(match(testfields,":"),testfields)
| eval mac_appears=case(match(testfields,":"),foundcount)
| eval ip=case(NOT match(testfields,":"),testfields)
| eval ip_appears=case(NOT match(testfields,":"),foundcount)
| fields - testfields foundcount
| eval typerec="scan record so kill me later"

| rename COMMENT as "add in the quickscan.csv records"
| inputcsv append=t quickscan.csv

| rename COMMENT as "this section rolls the information over from the scan records to the quickscan records then drops the scan records"
| eventstats sum(mac_appears) as mac_appears by mac
| eventstats sum(ip_appears) as ip_appears by ip
| where isnull(typerec)

Here's some run-anywhere code that folks can use to review the method...

| makeresults 
| eval mydata="22.22.22.222,22.22.22.222,Hogwarts,ij:jk:lm:no,ij:jk:lm:no!!!!33.33.333.333,NULL,US,pq:rs:tu:vw,!!!!"
| makemv delim="!!!!" mydata 
| mvexpand mydata
| makemv delim="," mydata
| eval  unknown_ip=mvindex(mydata,0)
| eval  dangerous_ip=case(mvindex(mydata,1)!="NULL",mvindex(mydata,1))
| eval  location=mvindex(mydata,2)
| eval  unknown_mac=mvindex(mydata,3)
| eval  blocked_mac=mvindex(mydata,4)
| rename COMMENT as "the above simulates entry of your ScanRecord.csv"

| rename COMMENT as "this section calculates the number of times that any IP or mac appears in the ScanRecord.csv"
| eval testfields=mvappend(unknown_ip,dangerous_ip,unknown_mac,blocked_mac)
| stats count as foundcount by testfields
| eval mac=case(match(testfields,":"),testfields)
| eval mac_appears=case(match(testfields,":"),foundcount)
| eval ip=case(NOT match(testfields,":"),testfields)
| eval ip_appears=case(NOT match(testfields,":"),foundcount)
| fields - testfields foundcount
| eval typerec="scan record so kill me"

| rename COMMENT as "this append simulates entry of your quickscan.csv"
| append 
    [| makeresults 
     | eval mydata=mvappend("11.11.11.11,ab:cd:ef:gh","22.22.22.222,ij:jk:lm:no","33.33.333.333,pq:rs:tu:vw")
     | mvexpand mydata 
     | makemv delim="," mydata 
     | eval ip=mvindex(mydata,0), mac=mvindex(mydata,1) 
     | table ip mac
     ]

| rename COMMENT as "this section rolls the information over from the scan records to the quickscan records then drops the scan records"
| eventstats sum(mac_appears) as mac_appears by mac
| eventstats sum(ip_appears) as ip_appears by ip
| where isnull(typerec)
0 Karma

russell120
Communicator

In line 5 of the first example, or line 15 in the second example, are you supposed to use eval foundcount = mvcount(testfields) since testfields stores multiple values?

0 Karma

russell120
Communicator

@DalJeanis Nice! Wow, you know your stuff. I added an edit at the very bottom of my post. Basically, the solution you provide will need to be able to work with a CSV with hundreds of rows/ips/events.

0 Karma

kmaron
Motivator

am I correct in assuming that the same ip can show up for more than one record in ScanRecord.csv?

0 Karma

mstjohn_splunk
Splunk Employee
Splunk Employee

hi @russell120

Did the answer below solve your problem? If so, please resolve this post by approving it! If your problem is still not solved, keep us updated so that someone else can help ya. Thanks for posting!

0 Karma

russell120
Communicator

@kmaron No, the ip would only appear in only one record.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...