Building a webcrawler knowledge base

Hi, I am trying to build a knowledge base that would crawl the websites of some of the committees of the legislature in Hong Kong (e.g. https://www.legco.gov.hk/en/legco-business/committees/panel.html?commerce-industry&2026#meetings). My intention is to create a web crawler to read the agenda and digest the agenda item documents (usually in PDF) on a regular basis for use by a custom agent. Unfortunately, the web crawler does not seem to be able to read the PDF documents despite having made configurations relating to the depth and type of files to be read. When querying the agent relying on the space attached to the knowledge base, the agent would say that it knows about the PDF file but cannot read it. Please see the screenshot.

Any advice would be much appreciated. Thanks.

Hi @Bryan_Ha and welcome to the Quick Community!

Are you able to verify that the pdf’s you’re trying to access are directly accessible (without authentication or any further permissions)?
Are the PDF files being stored within the space prior to querying your agent or simply attempting to read them from the source?

The information that it is returning (meeting overview notes), is that information being pulled from the pdfs or is it available somewhere else as well? Wondering if the agent is able to pull part of the information but not all