Hi all,
my input.conf is :
[monitor:///Users/user1/log.txt]
disabled = false
followTail = 1
sourcetype = log_test01
initCRCLength = 1024
Source file is as below. I am expecting splunk show only 1 event and disregard the other 2 events based on the properties above however I am seeing all 3 events in the search. Any help ?
{
"actions": [
{
"causes": [
{
"shortDescription": "Started by timer"
},
{
"shortDescription": "Started by timer"
},
{
"shortDescription": "Started by timer"
}
]
}
],
"artifacts": [],
"building": true,
"builtOn": "",
"changeSet": {
"items": [],
"kind": null
},
"culprits": [],
"description": null,
"duration": 0,
"estimatedDuration": 177034,
"executor": {},
"fullDisplayName": "ABC 123 - Prod support #62",
"id": "2013-07-05_14-44-17",
"keepLog": false,
"number": 62,
"result": null,
"timestamp": 1372999457676,
"url": "localhost:8080/jenkins/job/jobname"
}
{
"actions": [
{
"causes": [
{
"shortDescription": "Started by timer"
},
{
"shortDescription": "Started by timer"
},
{
"shortDescription": "Started by timer"
}
]
}
],
"artifacts": [],
"building": true,
"builtOn": "",
"changeSet": {
"items": [],
"kind": null
},
"culprits": [],
"description": null,
"duration": 0,
"estimatedDuration": 177034,
"executor": {},
"fullDisplayName": "ABC 123 - Prod support #62",
"id": "2013-07-05_14-44-17",
"keepLog": false,
"number": 62,
"result": null,
"timestamp": 1372999457676,
"url": "localhost:8080/jenkins/job/jobname"
}
{
"actions": [
{
"causes": [
{
"shortDescription": "Started by timer"
},
{
"shortDescription": "Started by timer"
},
{
"shortDescription": "Started by timer"
}
]
}
],
"artifacts": [],
"building": true,
"builtOn": "",
"changeSet": {
"items": [],
"kind": null
},
"culprits": [],
"description": null,
"duration": 0,
"estimatedDuration": 177034,
"executor": {},
"fullDisplayName": "ABC 123 - Prod support #62",
"id": "2013-07-05_14-44-17",
"keepLog": false,
"number": 62,
"result": null,
"timestamp": 1372999457676,
"url": "localhost:8080/jenkins/job/jobname"
}
I'm not sure you've understood your settings correctly.
Lets run through a few;
followTail tells Splunk just to monitor the file where it finds it and consume any new events, it won't read any of the original data that is already there.
The CRC length refers to how much of the file that Splunk generates a crc from. This doesn't affect the contents, it means that a CRC will be generated and any other files that match that CRC won't be consumed, this doesn't make it run the CRC every time it reaches the end of the crc length.
If you really want to cut data at index time you need to route it to dev/null. Have a read here for the docs on this;
http://docs.splunk.com/Documentation/Splunk/5.0.3/Deploy/Routeandfilterdatad
Not to mention you'd have to assume that every event would arrive in a lovely chronological order, which when you're dealing with a lot of events from across a wide and busy network - isn't always the case 🙂
No. Splunk won't do any comparisons like that (that would lead to horrible performance). Your options are either to make sure the duplicate events are never created in the first place, or accept that duplicate events will be indexed and then deduplicate them at search-time using dedup
.
Thanks Drainy.
I think REGEX pattern to disregard the data should be set in transforms.conf.
I am not very sure how this can be done in my case as splunk should be looking at the previous event, check if its the same event and if its the same event splunk should send it to the null queue else splunk should index.
Do you think this can be put in a single statement in the REGEX ? If so can you help me with that ?
Haha, well yes that may be more true. 🙂
Like, for noticing I'm alive!!
Upvote because Drainy is alive!!