XHTML 1.0 Transitional parser?

chevy9294 · 10 months ago

XHTML 1.0 Transitional parser?

gsfraley@lemmy.world · 10 months ago

I would try another HTML 5 parser. HTML 5 is somewhat of a unification of HTML and XHTML, getting into syntax-specifics between the two with XML parsing is probably going to be an uphill battle. That said, I’m curious what the first line is, it could just be malformed entirely.

chevy9294 · 10 months ago

Thats the first line:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

I thought it was html because it everything on the web is html. But because of the first line I figured out it was xhtml which should be parsed with xml parser, but I did not know the transitional is a mix which cant be parsed with anything.

gsfraley@lemmy.world · 10 months ago

Hmm, doctype declarations are sort of like the markup equivalent of headers. Usually parsers read them to know what flavor to expect and then go parse the rest of the page separately. You shouldn’t have to do this, but if you chop off that first line and run it through a standard HTML parser it might work fine.

chevy9294 · edit-2 10 months ago

Thats the first thing that I tried and still failes somewhere deep in the html where I probably shouldn’t skip a line.