All Apps and Add-ons

Website Input: Set field names to be HTML element, instead of attribute

cmodyssey
Explorer

Hi,

I have the below example XML to scrape:

    <COOK>
       <COOK_NAME>Cook</COOK_NAME>
       <COOK_TEMP>738</COOK_TEMP>
       <COOK_SET>3560</COOK_SET>
       <COOK_STATUS>0</COOK_STATUS>
    </COOK>
    <FOOD1>
       <FOOD1_NAME>Food1</FOOD1_NAME>
       <FOOD1_TEMP>OPEN</FOOD1_TEMP>
       <FOOD1_SET>1800</FOOD1_SET>
       <FOOD1_STATUS>4</FOOD1_STATUS>
    </FOOD1>
    <FOOD2>
       <FOOD2_NAME>Food2</FOOD2_NAME>
       <FOOD2_TEMP>OPEN</FOOD2_TEMP>
       <FOOD2_SET>1800</FOOD2_SET>
       <FOOD2_STATUS>4</FOOD2_STATUS>
    </FOOD2>
    <FOOD3>
       <FOOD3_NAME>Food3</FOOD3_NAME>
       <FOOD3_TEMP>OPEN</FOOD3_TEMP>
       <FOOD3_SET>1800</FOOD3_SET>
       <FOOD3_STATUS>4</FOOD3_STATUS>
    </FOOD3>
    <OUTPUT_PERCENT>100</OUTPUT_PERCENT>

I am extracting the data I want with the following CSS Selector:

cook_temp,cook_set,food1_temp,food1_set,output_percent

This results in the following events:

response_size="1235" match_2="3560" match_1="725" match="725" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="577.230930328" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="567.966938019" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="565.255880356" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="722" match="722" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="576.737880707" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="572.259187698" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"
response_size="1235" match_2="3560" match_1="721" match="721" match="3560" match="OPEN" match="1800" match="100" match_5="100" request_time="569.040060043" match_3="OPEN" match_4="1800" encoding="ascii" response_code="200" raw_match_count="5"

I would like to change each match_(number) to be the name of the HTML element (COOK_TEMP, COOK_SET etc.) . I can tell that by setting the "Name Attributes" would not help me, as that's trailered to setting based on HTML attributes, not HTML elements.

Is there a way to configure this to use HTML elements?

If not, is there some editing I can do to /opt/splunk/etc/apps/website_input/bin/web_input.py to do this, as I don't mind having some "non-standard" Website Input code on my system and don't know Python that well.

Thanks in advance,

Richard.

0 Karma
1 Solution

LukeMurphey
Champion

I think I can support this in the app natively. My main concern when writing this app was to support HTML but I like the ability the handle XML too.

I opened a ticket to look into and am considering several options: http://lukemurphey.net/issues/1145.

Update

Version 1.2 now has the ability to use the tag names as the field names. Just check the "Use Tag Name as Field Name". This version isn't the default yet; you will have to manually select it. Let me know if it works for you.

View solution in original post

0 Karma

cmodyssey
Explorer

Hi,

Thanks for looking into this.

I have just finished modifying /opt/splunk/etc/apps/website_input/bin/web_input.py to have it to include element names in the field names.

Here's the details of what I've done.

Change:

                # Unescape the text in case it includes HTML entities
                match_text = cls.unescape(WebInput.get_text(match))

To:

                # Unescape the text in case it includes HTML entities
                match_text = cls.unescape(WebInput.get_text(match))

                printable_match = "%s" % (match)
                re_result = re.search('Element (.*) at', printable_match)
                element = re_result.group(1)

Change:

                    if not field_made:
                        if output_matches_as_mv:
                            result['match'].append(match_text)

To:

                if not field_made:
                    if output_matches_as_mv:
                        #result['match'].append(match_text)
                        result['match_' + element] = match_text

My coding is not to a high standard, as it's my first time working on Python.

I wanted to believe in the Splunk statement of it being able to take any data in from any source and have got there after a lot of trial an error and debug log lines!

Cheers,

RIchard.

0 Karma

LukeMurphey
Champion

I think I can support this in the app natively. My main concern when writing this app was to support HTML but I like the ability the handle XML too.

I opened a ticket to look into and am considering several options: http://lukemurphey.net/issues/1145.

Update

Version 1.2 now has the ability to use the tag names as the field names. Just check the "Use Tag Name as Field Name". This version isn't the default yet; you will have to manually select it. Let me know if it works for you.

0 Karma

LukeMurphey
Champion

FYI: I have a solution for this that I am testing now.

0 Karma

cmodyssey
Explorer

Hi,

Thanks for all your work on version 1.2.

I have upgraded to that version this morning and it works perfectly 🙂

It's really good to get the data I need from the work that you've done to web_input.py, rather than mine.

Thanks again,

Richard.

0 Karma

cmodyssey
Explorer

Hi Luke,

Thanks for sticking with this, I will be good to get an official solution tho this, instead of the modifications that I have done.

Look forward to the new version 🙂

Richard.

0 Karma

LukeMurphey
Champion

Did that work for you? You can accept the answer to let me know it worked too.

0 Karma

cmodyssey
Explorer

Hi,

I had not seen your update about version 1.2, so I'm glad you commented, as it made me aware of it, thanks.

I will upgrade to that version to try it and update back on here on how I got on.

Thanks again,

Richard.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...