Commit 9d0541dd2f82a2058994ee2894aeae32ad4b90db

Nick Wellnhofer 2023-06-22T18:06:53

parser: Make xmlSwitchEncoding always skip the BOM Chromium calls xmlSwitchEncoding from the start document handler and relies on this function to skip the BOM. Commit 98840d40 changed the behavior when switching to UTF-16 since inspecting the input buffer at this point is fragile. Revert part of the commit to also skip a potential (decoded UTF-8) BOM when switching to UTF-16. Make sure that we do this only at the start of an input stream to avoid U-FEFF characters being lost. BOM handling should ultimately be moved to the parsing code to avoid such bugs. See https://bugs.chromium.org/p/chromium/issues/detail?id=1451026