Splunk Search

Duplicate entries with continuous csv indexing

haker146
New Member

Hello, I write with a small problem to you. I'm building a wi-fi monitoring system for my diploma thesis. I use Kismet software, which creates a netxml file for me, which I then parse to csv. Then, this csv file I want to constantly watch in splunk and see changes in signal strength over time. Unfortunately, each rebooting my netxml file and creating a new csv with the same name causes that more and more duplicates of the same network appear. As shown in the figure below. alt text

I am asking for help, what should I do to ensure that these duplicates do not arise and there is one original entry.

Tags (1)
0 Karma
1 Solution

msivill_splunk
Splunk Employee
Splunk Employee

Adding a timestamp to each event in the csv and viewing the csv as a snapshot in time the duplicates can then be handled within Splunk. This also gives the advantage of being able to plot how signal changes over time in Splunk. Splunk can be thought of a a time series database so adding events with the same data but with different timestamps is fine.

Generated event query to show concept of how signal strength changes over time.

| makeresults count=4 
| streamstats count 
| eval _time = _time - (count*3600) 
| eval mac_address = "00:1D:0F:FB:40:4A", channel=6, signal=(random()%10)+ -74 
| timechart max(signal) as Signal span=1h

Also within Splunk you will be able to query the first seen and last seen values, so no need to generate these fields in the extract itself

| makeresults count=4 
| streamstats count 
| eval _time = _time - (count* 3600) 
| eval mac_address = "00:1D:0F:FB:40:4A", channel=6, signal= count + -74 
| eval time=strftime(_time,"%y/%m/%d %H:%M:%S") 
| stats min(time) As first_seen, max(time) AS last_seen by mac_address

To map a field in the csv extract to a Splunk timestamp https://docs.splunk.com/Documentation/Splunk/7.1.1/Data/HowSplunkextractstimestamps

The _time field in Splunk is where the timestamp is held.

View solution in original post

0 Karma

msivill_splunk
Splunk Employee
Splunk Employee

Adding a timestamp to each event in the csv and viewing the csv as a snapshot in time the duplicates can then be handled within Splunk. This also gives the advantage of being able to plot how signal changes over time in Splunk. Splunk can be thought of a a time series database so adding events with the same data but with different timestamps is fine.

Generated event query to show concept of how signal strength changes over time.

| makeresults count=4 
| streamstats count 
| eval _time = _time - (count*3600) 
| eval mac_address = "00:1D:0F:FB:40:4A", channel=6, signal=(random()%10)+ -74 
| timechart max(signal) as Signal span=1h

Also within Splunk you will be able to query the first seen and last seen values, so no need to generate these fields in the extract itself

| makeresults count=4 
| streamstats count 
| eval _time = _time - (count* 3600) 
| eval mac_address = "00:1D:0F:FB:40:4A", channel=6, signal= count + -74 
| eval time=strftime(_time,"%y/%m/%d %H:%M:%S") 
| stats min(time) As first_seen, max(time) AS last_seen by mac_address

To map a field in the csv extract to a Splunk timestamp https://docs.splunk.com/Documentation/Splunk/7.1.1/Data/HowSplunkextractstimestamps

The _time field in Splunk is where the timestamp is held.

0 Karma

haker146
New Member

@msivill
Thank you so much for help. I still have a question and what if I want to make a table without repeating entries?

0 Karma

msivill_splunk
Splunk Employee
Splunk Employee

There is no concept of updating an event in Splunk. If you send the same data twice to Splunk then you will end up with 2 events. Using a timestamp when the events are saved into Splunk will help differentiate between the events. The above example produces a view without repeating entries (but there will still be duplicate events within Splunk itself)

0 Karma

haker146
New Member

The wrong thing I wrote is that my input file looks like this:
alt text

I build this file with a python script from scratch based on the xml file every minute and this csv file has been monitored all the time in splunk. When updating, the address of the mac, etc., the signal strength, last seen etc., changes. And every time I rebuild this file, the csv in splunk shows me a new entry to each maca, even if it was already there. My main point is not to add a new maca entry and update the signal value etc.

0 Karma

mdsnmss
SplunkTrust
SplunkTrust

When you reboot it generates a new log file under the same name. Does it contain the old entries still? Are you able to control the name of the file it generates? It sounds like each time you reboot it reindexes the entire file. You should be able to control this using your inputs.conf. Could you provide what your inputs.conf for this looks like?

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...