All Apps and Add-ons

Website Input App - How do I scrape pages with multiple layers

thaddeuslim
Explorer

Hi All, seeking help on this!

What I am trying to achieve here is to scrape the page for the health status of all the devices in
http://demo.stor2rrd.com/?menu=7acecf0&tab=0

However, what I am getting is that it only scrapes the home page source which do not contain the health status of the devices.

Also I tried to enable web crawl but it does not seem to help in this instance

Regards,
Thaddeus

0 Karma
1 Solution

LukeMurphey
Champion

The problem is that part of the page is rendered using Javascript. There are two approaches to work around this:

Option 1: Use the URL where the data comes from
Use the following URL instead: http://demo.stor2rrd.com/stor2rrd-cgi/glob_hs.sh?_=1542302484447

That URL includes just the health data which should work for you.

Option 2: Use a browser to render the content
You can also use a browser to render the content. To do that, select a different browser from the first page of the wizard. Make sure to hit the "test browser" button to ensure that the browser is installed and it working. I recommend using the prior method though since this approach is more complex and takes longer.

Other thoughts
That page is structured in an odd way which is going to make parsing difficult. I found that the selector below will provide a list of all of the devices that are down:

.hsnok + td

View solution in original post

0 Karma

LukeMurphey
Champion

The problem is that part of the page is rendered using Javascript. There are two approaches to work around this:

Option 1: Use the URL where the data comes from
Use the following URL instead: http://demo.stor2rrd.com/stor2rrd-cgi/glob_hs.sh?_=1542302484447

That URL includes just the health data which should work for you.

Option 2: Use a browser to render the content
You can also use a browser to render the content. To do that, select a different browser from the first page of the wizard. Make sure to hit the "test browser" button to ensure that the browser is installed and it working. I recommend using the prior method though since this approach is more complex and takes longer.

Other thoughts
That page is structured in an odd way which is going to make parsing difficult. I found that the selector below will provide a list of all of the devices that are down:

.hsnok + td
0 Karma

thaddeuslim
Explorer

Hi Luke,

Thank you for the approaches provided.

With regards to Option 1, could you share how did you generate the URL?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...