Skip to main content
Sync and manage websites

How to sync public URLs to the Knowledge Hub and enable this content for Fin.

Beth-Ann Sher avatar
Written by Beth-Ann Sher
Updated over a month ago

If you’d like to add website content to the Knowledge Hub and enable it for Fin AI Agent and Fin AI Copilot, you can do this by syncing the public URL for that site.

Get started

Go to Knowledge Hub > Sources and scroll down to “Websites” then click on the Sync button to get started.

Now enter the URL of your external support content (top-level domain).

This will fetch all of the pages from the website URL you provide and will read from all the sub domain pages.

Tips:

You can also add website content by clicking the New content button and selecting the Webpages option.

Advanced settings

Additional URLs

Website structures can vary. To make sure that we sync your most relevant content, we recommend you add additional URLs for those specific subpages.

For example, if you input https://myhelpcenter.com/help as the primary URL above, you might also want to add the specific URL like https://myhelpcenter.com/help/index.html

URLs globs to exclude

To exclude certain pages you don’t want to sync content from, you can add a list of URL globs.

What is a url glob?

A glob is a string of literal and/or wildcard characters used to match file paths or URLs. Globbing is the act of locating files on a filesystem using one or more globs. Using URL globs also helps to get a range of URLs that are mostly the same, with only a small portion of it changing between the requests.

For example, this URL glob https://{store,docs}.example.com/** lets the crawler access all URLs starting with https://store.example.com/ or https://docs.example.com/ and https://example.com/**/*\?*foo=*

CSS selectors to exclude

To exclude CSS selectors or avoid scraping content from specific sections, you can add a list of the selectors you want to ignore.

This is useful to skip irrelevant page content. The value must be a valid CSS selector as accepted by the document.querySelectorAll() function. By default, we already remove common navigation elements, headers, footers, modals, scripts, and inline images.

Clickable CSS selector

This allows for DOM elements identified by the CSS selector, to be clicked during the web sync process.

This is useful for expanding collapsed sections, in order to capture their text content. The value must be a valid CSS selector as accepted by the document.querySelectorAll() function.

Examples are "[aria-expanded=\"false\"]", #expand_section

Complex conditions can be also described with a CSS selector. In CSS, chaining the selectors without spaces creates an AND-like condition, for example .button.blue.small will match only elements with all three classes.

Using comma (,) as a separator works like OR, for example .button, .blue, h1 targets all elements with class button, or class blue, or first-level headings.

Wait to load CSS selector

To target content that may have a delay in appearing on the page, you can add a CSS selector that will make the web scraper wait before scraping content.

This is useful for pages for which the default content load recognition by idle network fails. Setting this option completely disables the default behavior, and the page will be processed only if the element specified by this selector appears.

The value must be a valid CSS selector as accepted by the document.querySelectorAll() function.

XML Sitemap

To access pages that might not be reachable from the initial URLs, you can enable XML Sitemap for a more robust web sync on sitemap supported websites.

If this option is enabled, the web scraper will look for Sitemaps at the domains of the provided source URL and enqueue matching URLs similarly as the links found on crawled pages. You can also reference a sitemap.xml file directly by adding it as another Start URL e.g. https://www.example.com/sitemap.xml


Manage website sources

Once the sync is complete, you’ll receive an email notification and the website will appear as a synced source under Knowledge Hub > Content.

If you click into a website source, you can preview and manage the individual pages that were synced from the public URL.

Website sources are read-only and can’t be edited within the Knowledge Hub, they must be edited at the source.

Configure settings

When you view a website source in the Knowledge Hub, you’ll find a "Details" panel on the right which contains:

  • Data: View the content type, language, creation date, and last update (when it was last synced with the source).

  • Fin settings: To enable/disable for Fin AI Agent and Fin AI Copilot. When enabled, the content becomes available to customers through Fin AI Agent and to teammates via Fin AI Copilot, respectively.

  • Fin AI Agent Audience: Ensure customers only get answers and see content that is relevant for them.

  • Link: The public URL for this website source.

  • Folder: The folder where this public URL lives in the Knowledge Hub. You can’t change the folder of synced content.

Make it available to Fin

To make a website source available to Fin AI Agent and/or Fin AI Copilot, go to the Knowledge Hub and view the source.

From the Details panel, scroll down to “Fin settings” and choose whether to toggle on:

  • Available for Fin AI Agent - This setting will make the public URL available for Fin AI Agent to use when responding to customers (it will respect any audience rules).

  • Available for Fin AI Copilot - This setting will make the public URL available for Fin AI Copilot to use when answering teammates questions in the inbox via the Copilot panel.

Learn how to set up Fin AI Agent for your customers or enable your team on using Fin AI Copilot in the inbox.

Teammates require access to Fin AI Copilot to use it in the inbox. This can be managed for each teammate from Settings > Teammates.

Make it available to a specific audience

If this website source is only relevant for a specific subset of customers, you can use audience filters to make it visible to certain people.

First, you’ll need to create and define the audience you want to target.

Then go to the Knowledge Hub and view the source. From the Details panel scroll down to “Audience” and use the dropdown to select one of your pre-defined audiences.

Note:

  • The default audience for public URLs is “Everyone”.

  • Fin AI Agent will also respect any audience you apply to a public URL and only use this article to answer customer questions if they match the audience rules.

  • Fin AI Copilot currently does not use audience rules when answering teammates in the inbox.

Re-sync or remove a website as a source

If you’d like to re-sync or remove a public URL as a source, go to Knowledge Hub > Sources and scroll down to “Websites” then click on Manage next to the source. Here, you can select whether to Re-sync or Remove this source.


Troubleshooting website sync

When importing website content to enable Fin, you need to enter the public URL. This will search for all pages nested under that URL and import them into your Knowledge Hub for Fin AI Agent to use.

If the importer didn't return the number of pages you expected, there are a few reasons...

The URL provided isn't the top level domain

The website sync works by going to the URL you provide and then searching for all pages nested under that URL. These pages must have the same URL pattern as the URL you provide.

For example, if the top level domain is https://myhelpcenter.com/home, then all pages you want to import must include /home prefix in the URL e.g. https://myhelpcenter.com/home/article. If they do not, remove the prefix and use the most basic URL stem e.g. https://myhelpcenter.com, then try the import again.

The URL is private

If the content you want to use is behind a login, Fin won't be able to access or import it.

Page limits

You can sync up to 10 different public URLs and Fin will sync a maximum of 3000 pages from each source. Syncing can sometimes fail if there is a very large amount of content on a single page.

If you’re using Intercom Articles to enable Fin AI Agent or Fin AI Copilot, these will be available in the Knowledge Hub automatically as public articles and don’t need to be imported.


💡Tip

Need more help? Get support from our Community Forum
Find answers and get help from Intercom Support and Community Experts


Did this answer your question?