Hash :
f933c898
Author :
Date :
2012-09-07T19:32:12
Keep non-significant blanks node in HTML parser
For https://bugzilla.gnome.org/show_bug.cgi?id=681822
Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes
are removed from a HTML document, for example:
<html>
<head>
<title>This is a test.</title>
</head>
<body>
<p>This is a test.</p>
</body>
</html>
is read as:
<html><head><title>This is a test.</title></head><body>
<p>This is a test.</p>
</body></html>
This changes the default behaviour but the old behaviour is available
as expected when using the parser flag HTML_PARSE_NOBLANKS
Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com>
* HTMLparser.c: change various places in the parser where ignorable_space
SAX callback was called without checking for the parser flag preference
* xmllint.c: make sure we use the new flag even for HTML parsing
* result/HTML/*: this modifies the output of a number of tests
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head> <title>Linux Today</title>
</head>
<body bgcolor="White" link="Blue" text="Black" vlink="Black" alink="Red">
<center>
<table border="0" width="100%" cellspacing="0" cellpadding="0">
<tr bgcolor="#FFFFFF">
<td height="90">
<a href="http://linuxtoday.com/cgi-bin/click.pl?adnum=49"><img src="/pics/door_linux.gif" border="0" width="468" height="60" alt="Atipa Linux solutions. Your reliable cluster, server, and workstation solution. Win a Free Celeron Linux Workstation!"></a>
</td>
<td>
<img src="/pics/lt.gif" vspace="5" alt="Linux Today Logo"><br><font size="-1"><a href="http://linux.com">linux.com</a> partner</font><p></p>
</td>
</tr>
</table>
<font size="2" face="Helvetica">
[ <a href="http://linuxtoday.com/">headlines</a> |
<a href="http://features.linuxtoday.com/">features</a> |
<a href="http://commercial.linuxtoday.com/">commercial</a> |
<a href="http://security.linuxtoday.com/">security</a> |
<a href="http://jobs.linuxtoday.com/">jobs</a> |
<a href="http://linuxtoday.com/volt/">volt</a> |
<a href="http://linuxtoday.com/contrib.pl">contribute/submit</a> |
<a href="http://linuxtoday.com/advertise/">advertise</a> |
<a href="http://linuxtoday.com/search.html">search</a> |
<a href="http://linuxtoday.com/digests/">site digests</a> |
<a href="http://linuxtoday.com/mail-lists">mailing lists</a> |
<a href="http://linuxtoday.com/about/">about us</a> |
<a href="http://linuxtoday.com/linkus.html">link us</a> ]</font>
</center>
<p>
</p>
</body>
</html>