GSA Site Scanning (Independent Publisher) (Preview)
Explore comprehensive insights into the health and compliance of US federal websites, offering a window into the dynamics and standards of the federal web presence. Through automated scans, this service generates detailed data on website policy compliance and best practices, enhancing the accessibility and management of government digital assets.
This connector is available in the following products and regions:
Service | Class | Regions |
---|---|---|
Logic Apps | Standard | All Logic Apps regions except the following: - Azure Government regions - Azure China regions - US Department of Defense (DoD) |
Power Automate | Premium | All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Power Apps | Premium | All Power Apps regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Contact | |
---|---|
Name | Richard Wilson |
URL | https://www.richardawilson.com/ |
[email protected] |
Connector Metadata | |
---|---|
Publisher | Richard Wilson |
Website | https://open.gsa.gov/api/site-scanning-api |
Privacy policy | https://www.gsa.gov/technology/government-it-initiatives/digital-strategy/terms-of-service-for-developer-resources |
Categories | IT Operations |
Creating a connection
The connector supports the following authentication types:
Default | Parameters for creating connection. | All regions | Not shareable |
Default
Applicable: All regions
Parameters for creating connection.
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
Name | Type | Description | Required |
---|---|---|---|
GSA API Key | securestring | The GSA API key which can be obtained from https://open.gsa.gov/api/site-scanning-api/ | True |
Throttling Limits
Name | Calls | Renewal Period |
---|---|---|
API calls per connection | 100 | 60 seconds |
Actions
Perform Website Analysis |
Performs a comprehensive analysis of websites based on various parameters such as target URL, final URL, and scan status. |
Retrieve Website Information |
Fetches details about websites, including the target and final URLs, ownership, scan status, and analytics detection. |
Retrieve Website Information by URL |
Fetches detailed information about a website based on the specified URL. |
Perform Website Analysis
Performs a comprehensive analysis of websites based on various parameters such as target URL, final URL, and scan status.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Target URL Domain
|
target_url_domain | string |
The domain name plus the top-level domain (TLD) of the Target URL Domain. This parameter specifies the starting point of the scanner, contrasting with the Final URL Domain where the scan concludes after redirects. |
|
Final URL Domain
|
final_url_domain | string |
The domain name plus the top-level domain (TLD) of the Final URL Domain. The Final URL Domain is where the scanner ends up after following redirects, in contrast to the Target URL Domain. |
|
Final URL Live
|
final_url_live | boolean |
Indicates whether the Final URL is live by returning an HTTP status code in the 2xx family. |
|
Target URL Redirects
|
target_url_redirects | boolean |
A boolean value indicating whether the Target URL redirects, which occurs when a 3xx HTTP status code is returned. Note that scanners have caching disabled, so 304 HTTP status codes are not present. |
|
Target URL Agency Owner
|
target_url_agency_owner | string |
Specifies the agency that owns or operates the website associated with the Target URL. |
|
Target URL Bureau Owner
|
target_url_bureau_owner | string |
Specifies the bureau that owns or operates the website associated with the Target URL. |
|
Scan Status
|
primary_scan_status | string |
Captures the status of the website scan and any known reasons for failure. The value unknown_error is reserved for errors not yet encoded in the system. |
|
DAP Detected at Final URL
|
dap_detected_final_url | boolean |
A boolean value indicating whether the Digital Analytics Program (DAP) is detected at the Final URL. |
Returns
- Body
- AnalysisDto
Retrieve Website Information
Fetches details about websites, including the target and final URLs, ownership, scan status, and analytics detection.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Target URL Domain
|
target_url_domain | string |
The domain name plus the top-level domain (TLD) of the Target URL Domain. The Target URL is where the scanner starts, contrasting with the Final URL, where the scanner ends after redirects. |
|
Final URL Domain
|
final_url_domain | string |
The domain name plus the top-level domain (TLD) of the Final URL Domain. The Final URL is where the scanner ends after following redirects, in contrast to the Target URL. |
|
Final URL Live
|
final_url_live | boolean |
Indicates whether the Final URL is live, returning an HTTP status code in the 2xx family. |
|
Target URL Redirects
|
target_url_redirects | boolean |
Records if the Target URL redirects (true if a 3xx HTTP status code is returned). Scanners have caching disabled, thus 304 status codes are absent. |
|
Target URL Agency Owner
|
target_url_agency_owner | string |
The agency that owns or operates the website associated with the Target URL. |
|
Target URL Bureau Owner
|
target_url_bureau_owner | string |
The bureau that owns or operates the website associated with the Target URL. |
|
Scan Status
|
primary_scan_status | string |
Captures the status of the scan and any known reason for failure. unknown_error is reserved for unencoded errors. |
|
DAP Detected at Final URL
|
dap_detected_final_url | boolean |
Indicates if the Digital Analytics Program (DAP) is detected at the Final URL. |
|
Limit
|
limit | integer |
Specifies the number of items to return in a single page of results. |
|
Page
|
page | integer |
Specifies the page number of the results to retrieve. |
Returns
Retrieve Website Information by URL
Fetches detailed information about a website based on the specified URL.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Website URL
|
url | True | string |
The URL of the website to retrieve information for. This should include the domain name and any relevant path components. |
Returns
- Body
- WebsiteApiResultDto
Definitions
AnalysisDto
Name | Path | Type | Description |
---|---|---|---|
Total Analyzed Items
|
total | number |
The total number of items analyzed. |
Total Agencies Analyzed
|
totalAgencies | number |
The total number of agencies for which the websites were analyzed. |
Total Final URL Base Domains
|
totalFinalUrlBaseDomains | number |
The total number of unique final URL base domains analyzed. |
PaginatedWebsiteResponseDto
Name | Path | Type | Description |
---|---|---|---|
Website Items
|
items | array of WebsiteApiResultDto |
An array of website results. |
First Page Link
|
links.first | string |
A link to the first page of results. |
Last Page Link
|
links.last | string |
A link to the last page of results. |
Next Page Link
|
links.next | string |
A link to the next page of results. On the last page of results, this will be an empty string. |
Previous Page Link
|
links.previous | string |
A link to the previous page of results. On the first page of results, this will be an empty string. |
Current Page
|
meta.currentPage | number |
The current page number. |
Item Count
|
meta.itemCount | number |
The number of items in the PaginationResponseDto.items array. |
Items Per Page
|
meta.itemsPerPage | number |
The number of items per page. This should be the same as the limit query parameter. |
Total Items
|
meta.totalItems | number |
The total number of items that match the query. |
Total Pages
|
meta.totalPages | number |
The total number of pages, calculated as floor(totalItems / itemsPerPage). |
WebsiteApiResultDto
Name | Path | Type | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Canonical Link
|
canonical_link | string |
Indicates the presence of a canonical link tag. |
||||||||
Cloud.gov Pages Hosting
|
cloud_dot_gov_pages | boolean |
Indicates that the final URL is hosted using Cloud.gov Pages. |
||||||||
Content Management System (CMS)
|
cms | string |
Indicates the content management system used to host the final URL. |
||||||||
DAP Detected at Final URL
|
dap_detected_final_url | boolean |
A boolean representing the presence of the Digital Analytics Program on the final URL. |
||||||||
DAP Parameters at Final URL
|
dap_parameters_final_url | object |
An object with Digital Analytics Program parameter keys and values at the final URL. |
||||||||
DNS Hostname
|
dns_hostname | string |
The domain of the underlying system, often suggesting the use of a cloud or CDN provider. |
||||||||
Final URL
|
final_url | string |
The URL after any redirects from the target URL. |
||||||||
Final URL MIME Type
|
final_url_MIMEType | string |
The MIME type of the final URL extracted from the Content-Type header. |
||||||||
Final URL Domain
|
final_url_domain | string |
The domain name + top-level domain of the final URL. |
||||||||
Final URL Live
|
final_url_live | boolean |
A boolean representing whether the final URL returned a 2xx family HTTP status code. |
||||||||
Final URL Same Domain
|
final_url_same_domain | boolean |
A boolean field representing whether the final URL is in the same domain as the target URL. If false, this implies a redirect. |
||||||||
Final URL Same Website
|
final_url_same_website | boolean |
Indicates if the final URL has a different path or domain from the target URL. |
||||||||
Final URL Status Code
|
final_url_status_code | number |
The HTTP status code of the final URL. |
||||||||
Final URL Website
|
final_url_website | string |
Includes the subdomain and the top-level domain of the final URL. |
||||||||
Main Element Presence at Final URL
|
main_element_present_final_url | boolean |
Indicates whether the element is present at the final URL. |
||||||||
Open Graph Article Modified Date at Final URL
|
og_article_modified_final_url | string |
The Open Graph article modified tag if available on the final URL. |
||||||||
Open Graph Article Published Date at Final URL
|
og_article_published_final_url | string |
The Open Graph article published tag if available on the final URL. |
||||||||
Open Graph Description at Final URL
|
og_description_final_url | string |
The Open Graph description tag if found on the final URL. |
||||||||
Open Graph Title at Final URL
|
og_title_final_url | string |
The Open Graph title tag if found on the final URL. |
||||||||
Robots.txt Crawl Delay
|
robots_txt_crawl_delay | integer |
The crawl delay value in seconds, if present in the robots.txt file. |
||||||||
Robots.txt Detected
|
robots_txt_detected | boolean |
Indicates whether the robots.txt file is detected. |
||||||||
Robots.txt Final URL
|
robots_txt_final_url | string |
The final URL of the robots.txt file after any redirects. |
||||||||
Robots.txt Final URL MIME Type
|
robots_txt_final_url_MIMETYPE | string |
The MIME type of the robots.txt page extracted from the Content-Type header. |
||||||||
Robots.txt Final URL Live
|
robots_txt_final_url_live | boolean |
Indicates whether the robots.txt final URL HTTP status is in the 2xx family. |
||||||||
Robots.txt Final URL Size in Bytes
|
robots_txt_final_url_size_in_bytes | number |
The file size of the robots.txt file in bytes. |
||||||||
Robots.txt Final URL Status Code
|
robots_txt_final_url_status_code | number |
The HTTP status code of the robots.txt final URL. |
||||||||
Robots.txt Target URL Redirects
|
robots_txt_target_url_redirects | boolean |
Indicates whether the target robots.txt URL redirects. This targets the robots.txt file specifically. |
||||||||
Scan Date
|
scan_date | string |
The datetime when the scan was performed. |
||||||||
Scan Status
|
primary_scan_status | string |
The success status of the Core Scan. |
||||||||
Sitemap.xml URL Count
|
sitemap_xml_count | integer |
Indicates the number of elements found in the sitemap.xml file. |
||||||||
Sitemap.xml Detected
|
sitemap_xml_detected | boolean |
Indicates whether the sitemap.xml file is found. |
||||||||
Sitemap.xml Final URL
|
sitemap_xml_final_url | string |
The final URL of the sitemap.xml page after any redirects. |
||||||||
Sitemap.xml Final URL MIME Type
|
sitemap_xml_final_url_MIMETYPE | string |
The MIME type of the sitemap.xml final URL extracted from the Content-Type header. |
||||||||
Sitemap.xml Final URL Filesize
|
sitemap_xml_final_url_filesize | integer |
The filesize of the sitemap.xml page in bytes. |
||||||||
Sitemap.xml Final URL Live
|
sitemap_xml_final_url_live | boolean |
Indicates whether the sitemap.xml final URL status code is in the 2xx family. |
||||||||
Sitemap.xml Final URL Status Code
|
sitemap_xml_final_url_status_code | number |
The HTTP status code of the sitemap.xml page. |
||||||||
Sitemap.xml PDF URL Count
|
sitemap_xml_pdf_count | integer |
The number of URLs that have the PDF extension in the sitemap.xml. |
||||||||
Sitemap.xml Target URL Redirects
|
sitemap_xml_target_url_redirects | boolean |
Indicates whether the sitemap.xml page redirects. This targets the sitemap.xml file specifically. |
||||||||
Sourced from DAP List
|
source_list_dap | boolean |
Indicates whether the Digital Analytics Program provided this URL for the Target URL List. |
||||||||
Sourced from Federal Domains List
|
source_list_federal_domains | boolean |
Indicates whether the List of Federal Domains provided this URL for the Target URL List. |
||||||||
Sourced from Other Lists
|
source_list_other | boolean |
Indicates whether a manually maintained list of additional websites provided this URL for the Target URL List. |
||||||||
Sourced from Pulse CIO List
|
source_list_pulse | boolean |
Indicates whether the pulse.cio.gov Snapshot provided this URL for the Target URL List. |
||||||||
Target URL
|
target_url | string |
The URL the scanner starts the scan with. |
||||||||
Target URL 404 Test
|
target_url_404_test | boolean |
Tests whether the target URL properly handles 404s by calling a UUID-based pathname. |
||||||||
Target URL Agency Owner
|
target_url_agency_owner | string |
The agency that owns the target URL. |
||||||||
Target URL Government Branch
|
target_url_branch | string |
The branch of government that the URL is associated with. |
||||||||
Target URL Bureau Owner
|
target_url_bureau_owner | string |
The bureau that owns the target URL. |
||||||||
Target URL Domain
|
target_url_domain | string |
The base domain (domain name + top-level domain) of the target URL. |
||||||||
Target URL Redirects
|
target_url_redirects | boolean |
Indicates whether the target URL redirects. |
||||||||
Third-party Service Count
|
third_party_service_count | number |
The number of third-party services found. |
||||||||
Third-party Service Domains
|
third_party_service_domains | array of string |
A list of third-party services making outbound calls from the final URL. A third-party is defined as not matching the hostname of the URL. |
||||||||
USWDS Count
|
uswds_count | number |
The total of all USWDS likelihood heuristics in a sum. |
||||||||
USWDS Favicon
|
uswds_favicon | number |
The presence of the USWDS US Flag favicon in HTML source. Presence adds 20 points to the USWDS likelihood heuristic. |
||||||||
USWDS Favicon in CSS
|
uswds_favicon_in_css | number |
The presence of the USWDS US Flag favicon in CSS source. Presence adds 20 points to the USWDS likelihood heuristic. |
||||||||
USWDS Inline CSS
|
uswds_inline_css | number |
The number of occurrences of .usa- CSS classes in inline HTML source. |
||||||||
USWDS Public Sans Font
|
uswds_publicsans_font | number |
The presence of the Public Sans font in CSS source. Presence adds 20 points to the USWDS likelihood heuristic. |
||||||||
USWDS Semantic Version
|
uswds_semantic_version | string |
The semantic version string of USWDS. |
||||||||
USWDS Source Sans Font
|
uswds_source_sans_font | number |
The presence of the Source Sans font in CSS source. Presence adds 5 points to the USWDS likelihood heuristic. |
||||||||
USWDS String Occurrences
|
uswds_string | number |
The number of times the string uswds occurs in the HTML source. |
||||||||
USWDS String in CSS
|
uswds_string_in_css | number |
The number of occurrences of uswds in the CSS source. |
||||||||
USWDS Tables
|
uswds_tables | number |
A calculation of the (number of HTML elements) * -10.
|