All Apps and Add-ons

REST API Modular Input: How can you avoid duplicate data?

cadapt
New Member

I am using the REST TA ( https://apps.splunk.com/apps/id/rest_ta ) to pull data from an API which outputs CSV data. The API allows me to pull all events or last 10 events and since I need everything, I need to pull all every time. This means that there is duplicated events every time the REST TA pulls data.

I need to use a key of some sort to avoid duplicates and do not know where to start. A search on this answers board and on stackoverflow are not resulting in answers that are what I need.

Is there a way to manually specific the ID/Key used to index the data? If so, then that would presumably prevent duplicates since it cannot be duplicated. Or, what I am doing on elasticsearch, is using a duplicate ID to overwrite existing data in the index that has the same key with the new data. That is also a possibility as some of the data from the source could have changed but needs to be updated (like if the issue is pending or resolved, etc.)

Thanks in advance.

Labels (1)
0 Karma

shadowpanter
New Member

Do you find any solution?

0 Karma

shadowpanter
New Member

Do you find any solution ?

0 Karma

Damien_Dallimor
Ultra Champion

Custom response handlers

Lots of examples

In your custom response handler you'd need some way of tracking uniqueness based on your events.
And then only output to Splunk newly unique events.

0 Karma

Damien_Dallimor
Ultra Champion

Does the API you are pulling data from have documentation ?

If so , does this documentation have information on how to apply cursoring to your requests ?

Typical cursoring approaches for REST API's involve from/until timestamps in the HTTP request or perhaps some sort of sequential event id that you only want events since.

If there is nothing available to you at the API interface , then you will have to plug in a custom response handler to the REST Input stanza.This custom response handler could keep a log of event ids/timestamps etc.. and then only output event data to Splunk for indexing that is unique. Would probably be very easy to do.

0 Karma

cadapt
New Member

Unfortunately, the API does not allow cursors or “events since”.

If I am using the REST TA as mentioned in the OP, how (and what) would I add and where for the customization you mentioned? I am not using the rest I put framework from git which I have see. Reference to in relation to the custom response code.

Does the customization mean I need to write my own I put module?

Is there no way to say what my unique key is when the data comes in with some sort of index time field extraction so as to make the event not insert or to update existing documents?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...