Getting Data In

duplicate event

msn2507
Path Finder

Hi all,
my input.conf is :

[monitor:///Users/user1/log.txt]
disabled = false
followTail = 1
sourcetype = log_test01
initCRCLength = 1024

Source file is as below. I am expecting splunk show only 1 event and disregard the other 2 events based on the properties above however I am seeing all 3 events in the search. Any help ?

{
    "actions": [
        {
            "causes": [
                {
                    "shortDescription": "Started by timer"
                }, 
                {
                    "shortDescription": "Started by timer"
                }, 
                {
                    "shortDescription": "Started by timer"
                }
            ]
        }
    ], 
    "artifacts": [], 
    "building": true, 
    "builtOn": "", 
    "changeSet": {
        "items": [], 
        "kind": null
    }, 
    "culprits": [], 
    "description": null, 
    "duration": 0, 
    "estimatedDuration": 177034, 
    "executor": {}, 
    "fullDisplayName": "ABC 123 - Prod support #62", 
    "id": "2013-07-05_14-44-17", 
    "keepLog": false, 
    "number": 62, 
    "result": null, 
    "timestamp": 1372999457676, 
    "url": "localhost:8080/jenkins/job/jobname"
}
{
    "actions": [
        {
            "causes": [
                {
                    "shortDescription": "Started by timer"
                }, 
                {
                    "shortDescription": "Started by timer"
                }, 
                {
                    "shortDescription": "Started by timer"
                }
            ]
        }
    ], 
    "artifacts": [], 
    "building": true, 
    "builtOn": "", 
    "changeSet": {
        "items": [], 
        "kind": null
    }, 
    "culprits": [], 
    "description": null, 
    "duration": 0, 
    "estimatedDuration": 177034, 
    "executor": {}, 
    "fullDisplayName": "ABC 123 - Prod support #62", 
    "id": "2013-07-05_14-44-17", 
    "keepLog": false, 
    "number": 62, 
    "result": null, 
    "timestamp": 1372999457676, 
    "url": "localhost:8080/jenkins/job/jobname"
}
{
    "actions": [
        {
            "causes": [
                {
                    "shortDescription": "Started by timer"
                }, 
                {
                    "shortDescription": "Started by timer"
                }, 
                {
                    "shortDescription": "Started by timer"
                }
            ]
        }
    ], 
    "artifacts": [], 
    "building": true, 
    "builtOn": "", 
    "changeSet": {
        "items": [], 
        "kind": null
    }, 
    "culprits": [], 
    "description": null, 
    "duration": 0, 
    "estimatedDuration": 177034, 
    "executor": {}, 
    "fullDisplayName": "ABC 123 - Prod support #62", 
    "id": "2013-07-05_14-44-17", 
    "keepLog": false, 
    "number": 62, 
    "result": null, 
    "timestamp": 1372999457676, 
    "url": "localhost:8080/jenkins/job/jobname"
}
Tags (1)
0 Karma

Drainy
Champion

I'm not sure you've understood your settings correctly.

Lets run through a few;

followTail tells Splunk just to monitor the file where it finds it and consume any new events, it won't read any of the original data that is already there.
The CRC length refers to how much of the file that Splunk generates a crc from. This doesn't affect the contents, it means that a CRC will be generated and any other files that match that CRC won't be consumed, this doesn't make it run the CRC every time it reaches the end of the crc length.

If you really want to cut data at index time you need to route it to dev/null. Have a read here for the docs on this;
http://docs.splunk.com/Documentation/Splunk/5.0.3/Deploy/Routeandfilterdatad

Drainy
Champion

Not to mention you'd have to assume that every event would arrive in a lovely chronological order, which when you're dealing with a lot of events from across a wide and busy network - isn't always the case 🙂

0 Karma

Ayn
Legend

No. Splunk won't do any comparisons like that (that would lead to horrible performance). Your options are either to make sure the duplicate events are never created in the first place, or accept that duplicate events will be indexed and then deduplicate them at search-time using dedup.

msn2507
Path Finder

Thanks Drainy.

I think REGEX pattern to disregard the data should be set in transforms.conf.

I am not very sure how this can be done in my case as splunk should be looking at the previous event, check if its the same event and if its the same event splunk should send it to the null queue else splunk should index.

Do you think this can be put in a single statement in the REGEX ? If so can you help me with that ?

0 Karma

Ayn
Legend

Haha, well yes that may be more true. 🙂

0 Karma

Drainy
Champion

Like, for noticing I'm alive!!

0 Karma

Ayn
Legend

Upvote because Drainy is alive!!

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...