All Apps and Add-ons

Why isn't the Splunk REST API able to pull data frequently?

vgollapudi
Communicator

Hello Splunk Community !!

I've configured Splunk REST API in our environment and I was able to see data when I initially configured it but it's unable to do so frequently. I've tired using Polling Intervals but it didn't workout (default is 60 seconds). Currently, the setup looks like this for the REST API APP

  1. Installed the APP on Heavy Forwarder and it's able to forward events to indexers and I can see those in Search Head UI
  2. Edited props.conf file to include LINE_BREAKER and TIME_STAMP fields such that it can split events and assign the timestamp of the event based on the data received.

inputs.conf

[rest://CFTest]
auth_type = none
endpoint = https://api.cloudflare.com/client/v4/zones/CFZONE/logs/received?start=$start_time$&end=$end_time$&fields=RayID,ClientIP,EdgeStartTimestamp,ClientRequestHost&timestamps=rfc3339
http_header_propertys = X-Auth-Email=XXX@XXX.com,X-Auth-Key=XXXX
http_method = GET
index_error_response_codes = 1
response_type = json
sequential_mode = 0
sourcetype = cloudflare
streaming_request = 0
cookies = __cfduid=d2a7b8efd7e8cefe148fdb2a95369cf9d1522783367
disabled = 0
index = incapsula
polling_interval = 
backoff_time = 60

When I was investigating why it's pulling logs infrequently then I came across this information in splunkd.log. Interestingly logs are pulled during the timestamp whenever I see this in the splunkd.log and after that timestamp I can't see it.

04-04-2018 03:39:21.959 +0000 INFO  ExecProcessor - New scheduled exec process: python /opt/splunk/etc/apps/rest_ta/bin/rest.py
04-04-2018 03:39:21.959 +0000 INFO  ExecProcessor -     interval: run once
04-04-2018 03:39:21.959 +0000 INFO  ExecProcessor - interval="5 3 * * *" is a valid cron schedule

If I edit inputs.conf which I have listed above, then again I can see logs around that particular timestamp. I don't know how this schedule is decided and how to change it based on our requirement which is to pull every minute.

Please let me know if anyone came across this situation in your environment and what steps you took to resolve the issue.

Thanks
Venky

0 Karma
1 Solution

vgollapudi
Communicator

We did a workaround for this problem by writing a python script such that it will query the CloudFlare API and ingest the logs.

View solution in original post

0 Karma

skoelpin
SplunkTrust
SplunkTrust

I have it polling every 3 seconds successfully.

0 Karma

vgollapudi
Communicator

@skoelpin can you share how it was done?

0 Karma

Damien_Dallimor
Ultra Champion

It's very easy , the setup screen is even documented for you.

The polling interval can be either 1) a cron expression or 2) a polling interval in seconds

0 Karma

vgollapudi
Communicator

@Damien it didn't work for me. All it worked for the first time.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Yeah, just follow the documentation. You set the endpoint and polling interval. It's very simple. What are the server specs of the system accepting the requests? Is your system able to carry the load?

0 Karma

vgollapudi
Communicator

We did a workaround for this problem by writing a python script such that it will query the CloudFlare API and ingest the logs.

0 Karma

prescilianoneto
Path Finder

Awesome. Is it publicly available? If not, could you please share it with me?

0 Karma

vgollapudi
Communicator

It's not publicly available.

0 Karma

prescilianoneto
Path Finder

No problem, I wrote my own code and now it is publicly available: https://github.com/presciliano/cloudflarelogstosplunk

pruthvikrishnap
Contributor

Hi vgollapudi,

Try adding
in inputs.conf:
http_header_propertys = content-type= application/json,accept=application/json
turn streaming_request = 1

In props.conf:

TRUNCATE = 0
INDEXED_EXTRACTIONS = JSON
KV_MODE = none

let me know if this helps..

0 Karma

vgollapudi
Communicator

Problem is with the cron job schedule and currently we are using script to pull the events.

0 Karma

Damien_Dallimor
Ultra Champion

If you read the docs , the polling interval field can also be a polling interval in seconds

0 Karma

vgollapudi
Communicator

This is the tokens.py

import datetime

now = datetime.datetime.now()

def sometoken():
return 'zoo'

def sometokenlist():
return ['goo','foo','zoo']

def datetoday():
today = datetime.date.today()
return today.strftime('%Y-%m-%d')

def start_time():
start_time = now - datetime.timedelta(minutes=7)
return start_time.strftime('%Y-%m-%dT%H:%M:%SZ')

def end_time():
end_time = now - datetime.timedelta(minutes=6)
return end_time.strftime('%Y-%m-%dT%H:%M:%SZ')

0 Karma

Damien_Dallimor
Ultra Champion

Any log errors ?

Try this search : index=_internal ExecProcessor error rest.py

There is no externally configurable polling_typeparameter in the REST Mod Input.

There is only polling_interval

alt text

The App automatically determines if this field is an integer or a cron pattern, sets an internal variable to indicate the type of polling and then executes the polling loop accordingly.

If the polling_interval parameter is left blank , then it defaults to 60. So will poll your REST endpoint every minute.

0 Karma

vgollapudi
Communicator

Hello Damien,

Thanks for responding to my question. I tried the search you have suggested, it only reports about the CloudFlare errors nothing with the REST API APP. I'm still unsure why it works when I update inputs.conf by changing one of it's key value pair but not relying on the polling_interval. Even if I don't provide polling interval it should work based on your comment that REST endpoint should pull every minute.

Thanks
Venky

0 Karma

prescilianoneto
Path Finder

Hello Venky
Did you manage to solve this problem and send Cloudflare logs to Splunk? I'm trying the same here, so any advice will be greatly appreciated.
Cheers,
Presciliano

0 Karma

jconger
Splunk Employee
Splunk Employee

The code that issues the REST call is basically an infinite loop that goes to sleep the number of seconds you specify for the polling_interval. Here is a snippet:

if polling_type == 'interval':                         
    time.sleep(float(polling_interval))

For cron, it is similar, but a little different:

if polling_type == 'cron':
    next_cron_firing = cron_iter.get_next(datetime)
    while get_current_datetime_for_cron() != next_cron_firing:
        time.sleep(float(10))

Since you don't have a polling_interval in your inputs.conf, your polling_type should be 'interval' and the default 60 seconds should apply.

The only thing I see that looks a little wonky is the tokens in your endpoint. These tokens have to be defined in bin/tokens.py.

0 Karma

vgollapudi
Communicator

Hello jconger, I have added those two parameters under bin/tokens.py. I don't know how polling_type is related to the interval.

0 Karma

Damien_Dallimor
Ultra Champion

please read my answer below , polling_type is irrelevant (i wrote it).

0 Karma

vgollapudi
Communicator

Yes Damien, I agree with you.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...