[aseek-devel] additional debug info and HTML parser

From: Danil Pismenny (no email)
Date: Sat Apr 13 2002 - 11:11:07 EDT


I post this message whith wrong subject before and nobody reply
me. So, I repost.

  I've added the debug output to know which site (URL) is added to the
  database and which is disallowed and why it is. What is loglevel do
  I must use for it? I use the DEBUG loglevel now, but perhaps the
  INFO loglevel will be much useful.

  Also, some pages that are added to the database are not parsed or
  parsed with errors (there is buggy tags in those pages). I added the
  debug output that shows which tags are parsed. Is there any needs in
  this output for anybody else?

  The HTML parser is very strict, I've patched it to parse the tags
  attributes that content mix quotes (e.g. <meta content='asdas") and
  not closed tags (the tag is automaticaly closed if its length is
  more than 1024 chars and there is '<' symbol). Any comments?

-- 
Danil Pismenny
http://dapi.chaz.ru/







Hosted Email Solutions

Invaluement Anti-Spam DNSBLs



Powered By FreeBSD   Powered By FreeBSD