Getting Data In

How do I unzip a file when pulling it from REST API?

tamduong16
Contributor

So the rest API that I set up in Splunk will go out to this rest endpoint and the file that it will receive is a zip file. Inside this zip file, there are 2 CSV files but I only need to index 1 file (in this case, the file name is ENDPOINT_CDR_DETAIL_ALL_CSV). But I only see 3 options for the response type which is text, xml, and json. Does Splunk have an option for us to set may be a response handler to unzip the file and only index 1 file out of the 2?

The name and form of the file:
alt text

Content inside the zip file:
alt text

0 Karma
1 Solution

Damien_Dallimor
Ultra Champion

In rest_ta/bin/responsehandlers.py add a custom response handler , pseudo example :

class ZipFileResponseHandler:

def __init__(self,**args):
    self.csv_file_to_index = args['csv_file_to_index']

def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):
    import zipfile,io,re
    file = zipfile.ZipFile(BytesIO(response_object.content))
    for info in file.infolist():
        if re.match(self.csv_file_to_index, info.filename):
            filecontent = file.read(info)
            print_xml_stream(filecontent)

In your config stanza , apply this handler :

alt text

The csv_file_to_index parameter value in this example is a python regex such as :

  1. ENDPOINT_CDR_DETAIL_ALL_CSV\.csvfor an exact filename to extract from the zip
  2. .*CDR_DETAIL.*\.csv$ for a pattern for the filename(s) to extract from the zip

View solution in original post

Damien_Dallimor
Ultra Champion

In rest_ta/bin/responsehandlers.py add a custom response handler , pseudo example :

class ZipFileResponseHandler:

def __init__(self,**args):
    self.csv_file_to_index = args['csv_file_to_index']

def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):
    import zipfile,io,re
    file = zipfile.ZipFile(BytesIO(response_object.content))
    for info in file.infolist():
        if re.match(self.csv_file_to_index, info.filename):
            filecontent = file.read(info)
            print_xml_stream(filecontent)

In your config stanza , apply this handler :

alt text

The csv_file_to_index parameter value in this example is a python regex such as :

  1. ENDPOINT_CDR_DETAIL_ALL_CSV\.csvfor an exact filename to extract from the zip
  2. .*CDR_DETAIL.*\.csv$ for a pattern for the filename(s) to extract from the zip

tamduong16
Contributor

This is my version of the code:

class ZipFileResponseHandler:

def __init__(self,**args):
    pass

def __call__(self, response_object, raw_response_output, response_type, req_args, endpoint):
    file = zipfile.ZipFile(StringIO.StringIO(response_object.content))
    for name in file.namelist():
        if "ENDPOINT" in name:
            data =file.read(name)
            data = data.split('\n')
            for element in data[1:]:
                print_xml_stream(element)
0 Karma

Damien_Dallimor
Ultra Champion

I suggest using the REST API Modualr Input and plugging in a custom response handler to perform the unzipping for you and any other pre processing you require.

Here is an example in another answer.

tamduong16
Contributor

Could you give me more information as how do I make the handler give the specific file to the indexer

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi

can you please let me how you call REST API, using the script or anything else ??

0 Karma

tamduong16
Contributor

I was able to download the rest api from splunk but for now, I'm not using any script yet. Do you think I could do this by writing a script that could run every minute to go to the url api? Again if the script allows me to unzip the file and pick what file I want. Thanks!

0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

Yes,
you can create scripted input which downloads and extracts files for you.

Create inputs.conf in your app and put below configuration in file.

[script:///opt/splunk/etc/app/yourapp/bin/scriptedfile.py]
disabled = 0
interval = 60

This will run file every 60 secs. You can change as per your requirement.

Create bin/scriptedfile.py and do code for REST API (file download ) and extraction of files.

Scripted Input docs:
https://docs.splunk.com/Documentation/SplunkCloud/6.6.3/AdvancedDev/ScriptedInputsIntro

0 Karma

tamduong16
Contributor

Or if REST API couldn't do this. Is there any alternative way?

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...