You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
var pageRequester = new PageRequester(new CrawlConfiguration(), new WebContentExtractor());
var crawledPage = await pageRequester.MakeRequestAsync(validUri).ConfigureAwait(false);
Log.Logger.Information("{@Result}", new
{
url = crawledPage.Uri,
status = Convert.ToInt32(crawledPage.HttpResponseMessage.StatusCode)
});
return crawledPage.Content.Text;
That website has a less common chartset in the header set like this
Is this a bug that the "iso-8859-2" charset is not being interpreted correctly ? Or am I missing something from the configuration or setup in order to handle this charset?
The text was updated successfully, but these errors were encountered:
I am trying to crawl this page
https://www.tzb-info.cz/kontakty
By passing it to validUri in the following code:
That website has a less common chartset in the header set like this
The result is the Content.Text is always empty despite the response code being successful.
If I try to read the response stream directly I get this exception:
If I change the ChartSet on the response manually I am then able to read the stream:
This is my workaround for now.
Is this a bug that the "iso-8859-2" charset is not being interpreted correctly ? Or am I missing something from the configuration or setup in order to handle this charset?
The text was updated successfully, but these errors were encountered: