1. html-encoding-sniffer
Sniff the encoding from a HTML byte stream
html-encoding-sniffer
Package: html-encoding-sniffer
Created by: jsdom
Last modified: Sun, 12 Nov 2023 07:38:15 GMT
Version: 4.0.0
License: MIT
Downloads: 81,969,230
Repository: https://github.com/jsdom/html-encoding-sniffer

Install

npm install html-encoding-sniffer
yarn add html-encoding-sniffer

Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

 const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");

const htmlBytes = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBytes);

The passed bytes are given as a Uint8Array; the Node.js Buffer subclass of Uint8Array will also work, as shown above.

The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:

 const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBytes, sniffedEncoding);

Options

You can pass two potential options to htmlEncodingSniffer:

 const sniffedEncoding = htmlEncodingSniffer(htmlBytes, {
  transportLayerEncodingLabel,
  defaultEncoding
});

These represent two possible inputs into the encoding sniffing algorithm:

  • transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
  • defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale).

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

RELATED POST

10 Must-Know Windows Shortcuts That Will Save You Time

10 Must-Know Windows Shortcuts That Will Save You Time

Arrays vs Linked Lists: Which is Better for Memory Management in Data Structures?

Arrays vs Linked Lists: Which is Better for Memory Management in Data Structures?

Navigating AWS Networking: Essential Hacks for Smooth Operation

Navigating AWS Networking: Essential Hacks for Smooth Operation

Achieving Stunning Visuals with Unity's Global Illumination

Achieving Stunning Visuals with Unity's Global Illumination

Nim's Hidden Gems: Lesser-known Features for Writing Efficient Code

Nim's Hidden Gems: Lesser-known Features for Writing Efficient Code