Getting Data In

How to remove duplicate values based on condition

Laya123
Communicator

Hi,

I am getting duplicate values I want to remove duplicate values where the condition will match.

Example

date-------------------------------------host--------------------id-----------------------------Status -------------------Project
3/10/2015 01:10:30------------------ A --------------------- 1------------------------- Activated--------------------A
3/10/2015 01:10:30------------------ A --------------------- 1------------------------- Disabled---------------------A
3/10/2015 02:10:35------------------ A --------------------- 2------------------------- Disabled---------------------
3/10/2015 02:10:30------------------ A --------------------- 2------------------------- Activated--------------------B
3/10/2015 01:10:30------------------ A --------------------- 2------------------------- Disabled---------------------B
3/10/2015 03:15:30------------------ B --------------------- 1------------------------- Activated--------------------C
3/10/2015 03:15:30------------------ B --------------------- 1------------------------- Disabled---------------------C
3/10/2015 04:10:30------------------ A --------------------- 3------------------------- Activated--------------------D
3/10/2015 04:10:30------------------ A --------------------- 3------------------------- Disabled---------------------D
3/10/2015 05:12:32------------------ A --------------------- 4------------------------- Disabled---------------------
3/10/2015 05:12:30------------------ A --------------------- 4------------------------- Activated--------------------E
3/10/2015 05:12:30------------------ A --------------------- 4------------------------- Disabled--------------------E
3/10/2015 06:10:30------------------ B --------------------- 2------------------------- Disabled--------------------F
3/10/2015 06:10:30------------------ B --------------------- 2------------------------- Disabled--------------------F
3/10/2015 06:10:32------------------ B --------------------- 3------------------------- Activated--------------------F
3/10/2015 06:10:32------------------ B --------------------- 3------------------------- Disabled--------------------F
3/10/2015 06:20:35------------------ B --------------------- 4------------------------- Disabled--------------------G
3/10/2015 06:20:32------------------ B --------------------- 4------------------------- Activated--------------------G
3/10/2015 06:20:32------------------ B --------------------- 4------------------------- Disabled--------------------G

I have used dedup to delete duplicate values. I expected results like wherever status is Activated for each ID (this ID is unique for each host). but dedup is retaining first value and deleting other duplicate values of IDs from each host. I am getting wrong results for some IDs means from above example for ID-2 for host 'A' actually project is B is Activated. but I am status is Disabled and Project is Blank.

means I want to delete duplicates of IDs for each host where the status of the ID should be Acitvated (if same ID is having Activated and Disabled I want Activated values in my output) but for some IDs the actual status Disabled (for these IDs we wont have status Activated) for that it should be Disabled only (from above example for host 'B' for Id-'2' the status is Disabled so the final status is also Disabled only)

my search|dedup ID host

I am expecting my output like:
date-------------------------------------host--------------------id-----------------------------Status -------------------Project
3/10/2015 01:10:30------------------ A --------------------- 1------------------------- Activated--------------------A
3/10/2015 02:10:30------------------ A --------------------- 2------------------------- Activated--------------------B
3/10/2015 03:15:30------------------ B --------------------- 1------------------------- Activated--------------------C
3/10/2015 04:10:30------------------ A --------------------- 3------------------------- Activated--------------------D
3/10/2015 05:12:30------------------ A --------------------- 4------------------------- Activated--------------------E
3/10/2015 06:10:30------------------ B --------------------- 2------------------------- Disabled--------------------F
3/10/2015 06:10:32------------------ B --------------------- 3------------------------- Activated--------------------F
3/10/2015 06:20:32------------------ B --------------------- 4------------------------- Activated--------------------G

Thanks in adnvance

0 Karma

lguinn2
Legend

Try this

yoursearchhere
| sort 0 host id Status -_time
| dedup host id Status
| eventstats count by host id
| where Status=="Activated" OR count==1

By sorting the data, you are controlling how dedup will process it. The where will save all Activated events - but if the event is Disabled, it will still be saved if it is the only event for host & id.

nick405060
Motivator

Good answer. This was my code for my use case:

streamstats count as number by LastFirst | eventstats count as days_badged by LastFirst | eval days_badged=days_badged-1 | eval searcher=if("$report_type$"="metrics","\d*","1") | where match(number,searcher)
0 Karma

Laya123
Communicator

Hi Thank you so much for your immediate response.

Its working but still I am getting some problems. I forgot to mention that I have other status also its not only Activated, sometimes it could be Published or Launched or Failed or Disabled but I want to check for particular ID the status count is more than one means combination of Activated and Disabled or Published and Disabled or Launched and Disabled or Failed and Disabled, I want to delete Disabled and retain the other status. but some times there is no any combination it will be only disable for that time I have to retain Disable for that particular ID.

Thanks in advance

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...