Stop using doc->charset outside parser code doc->charset does not specify the in-memory encoding which is always UTF-8.