Share via


MiniSoup HTML Parser (Independent Publisher) (Preview)

A lightweight HTML parsing library inspired by Beautiful Soup, providing capabilities for HTML element analysis and extraction

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
     -   US Department of Defense (DoD)
Contact
Name MiniSoup Support
URL https://github.com/DEmodoriGatsuO/MiniSoup
Email [email protected]
Connector Metadata
Publisher Shogo Shindo
Website https://github.com/DEmodoriGatsuO/MiniSoup
Privacy policy https://github.com/DEmodoriGatsuO/MiniSoup/blob/main/PRIVACY.md
Categories Data;Website

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Extract Values from HTML Elements

Extracts specific attribute values from HTML elements matching the provided selector

Fetch HTML Content

Fetches HTML content from a specified URL

Find All Matching Elements

Finds all HTML elements matching the specified tag name and optional attributes

Parse HTML Table

Parses an HTML table into structured data with headers and rows

Select HTML Elements

Selects HTML elements matching the provided selector

Extract Values from HTML Elements

Extracts specific attribute values from HTML elements matching the provided selector

Parameters

Name Key Required Type Description
html
html True string

HTML content to be parsed

selector
selector True string

CSS selector or XPath for targeting elements

attribute
attribute True string

Attribute to extract from selected elements. Use 'text' for inner text, 'html' for inner HTML, or specific attribute name

selector_type
selector_type string

Type of selector to use

Returns

Name Path Type Description
success
success boolean

Indicates whether the operation was successful

values
values array of string

Array of extracted values from the matching elements

count
count integer

Number of values extracted

Fetch HTML Content

Fetches HTML content from a specified URL

Parameters

Name Key Required Type Description
url
url True string

URL to fetch HTML content from

Returns

Name Path Type Description
success
success boolean

Indicates whether the operation was successful

html
html string

HTML content retrieved from the specified URL

Find All Matching Elements

Finds all HTML elements matching the specified tag name and optional attributes

Parameters

Name Key Required Type Description
html
html True string

HTML content to be parsed

tag_name
tag_name True string

HTML tag name to search for

id
id string

Filter by element ID

class
class string

Filter by element class

Returns

Name Path Type Description
success
success boolean

Indicates whether the operation was successful

elements
elements array of HtmlElement

Array of HTML elements that match the specified tag name and attributes

count
count integer

Number of elements found

Parse HTML Table

Parses an HTML table into structured data with headers and rows

Parameters

Name Key Required Type Description
html
html True string

HTML content containing the table

table_selector
table_selector string

CSS selector to locate the HTML table element

header_rows_exist
header_rows_exist boolean

Whether the table has header rows

Returns

Name Path Type Description
success
success boolean

Indicates whether the operation was successful

Headers
data.Headers array of string

Column headers extracted from the table

Rows
data.Rows array of array

Table rows, each containing an array of cell values

items
data.Rows array of string

Select HTML Elements

Selects HTML elements matching the provided selector

Parameters

Name Key Required Type Description
html
html True string

HTML content to be parsed

selector
selector True string

CSS selector or XPath for targeting elements

selector_type
selector_type string

Type of selector to use

Returns

Name Path Type Description
success
success boolean

Indicates whether the operation was successful

elements
elements array of HtmlElement

Array of HTML elements that match the specified selector

count
count integer

Number of elements found

Definitions

HtmlElement

Represents an HTML element with its properties and attributes

Name Path Type Description
tag
tag string

The HTML tag name of the element (e.g., 'div', 'span', 'a')

outerHtml
outerHtml string

The complete HTML of the element including the element itself

innerHtml
innerHtml string

The HTML content inside the element, which may include other elements

innerText
innerText string

The text content inside the element with all HTML tags removed

attributes
attributes object

All attributes of the element as name-value pairs

isSelfClosing
isSelfClosing boolean

Indicates whether the element is a self-closing tag (e.g., ,
)