jsdom

jsdom is a pure-JavaScript implementation of many web standards, notably the WHATWG [DOM](https://dom.spec.whatwg.org/) and [HTML](https://html.spec.whatwg.org/multipage/) Standards, for use with Node.js. In general, the goal of the project is to emulate enough of a subset of a web browser to be useful for testing and scraping real-world web applications. The latest versions of jsdom require newer Node.js versions; see the `package.json` `"engines"` field for details. ## Basic usage ```js const jsdom = require("jsdom"); const { JSDOM } = jsdom; ``` To use jsdom, you will primarily use the `JSDOM` constructor, which is a named export of the jsdom main module. Pass the constructor a string. You will get back a `JSDOM` object, which has a number of useful properties, notably `window`: ```js const dom = new JSDOM(`

Hello world

`); console.log(dom.window.document.querySelector("p").textContent); // "Hello world" ``` (Note that jsdom will parse the HTML you pass it just like a browser does, including implied ``, ``, and `` tags.) The resulting object is an instance of the `JSDOM` class, which contains a number of useful properties and methods besides `window`. In general, it can be used to act on the jsdom from the "outside," doing things that are not possible with the normal DOM APIs. For simple cases, where you don't need any of this functionality, we recommend a coding pattern like ```js const { window } = new JSDOM(`...`); // or even const { document } = (new JSDOM(`...`)).window; ``` Full documentation on everything you can do with the `JSDOM` class is below, in the section "`JSDOM` Object API". ## Customizing jsdom The `JSDOM` constructor accepts a second parameter which can be used to customize your jsdom in the following ways. ### Simple options ```js const dom = new JSDOM(``, { url: "https://example.org/", referrer: "https://example.com/", contentType: "text/html", includeNodeLocations: true, storageQuota: 10000000 }); ``` - `url` sets the value returned by `window.location`, `document.URL`, and `document.documentURI`, and affects things like resolution of relative URLs within the document and the same-origin restrictions and referrer used while fetching subresources. It defaults to `"about:blank"`. - `referrer` just affects the value read from `document.referrer`. It defaults to no referrer (which reflects as the empty string). - `contentType` affects the value read from `document.contentType`, as well as how the document is parsed: as HTML or as XML. Values that are not a [HTML MIME type](https://mimesniff.spec.whatwg.org/#html-mime-type) or an [XML MIME type](https://mimesniff.spec.whatwg.org/#xml-mime-type) will throw. It defaults to `"text/html"`. If a `charset` parameter is present, it can affect [binary data processing](#encoding-sniffing). - `includeNodeLocations` preserves the location info produced by the HTML parser, allowing you to retrieve it with the `nodeLocation()` method (described below). It also ensures that line numbers reported in exception stack traces for code running inside ` `); // The script will not be executed, by default: console.log(dom.window.document.getElementById("content").children.length); // 0 ``` To enable executing scripts inside the page, you can use the `runScripts: "dangerously"` option: ```js const dom = new JSDOM(`
`, { runScripts: "dangerously" }); // The script will be executed and modify the DOM: console.log(dom.window.document.getElementById("content").children.length); // 1 ``` Again we emphasize to only use this when feeding jsdom code you know is safe. If you use it on arbitrary user-supplied code, or code from the Internet, you are effectively running untrusted Node.js code, and your machine could be compromised. If you want to execute _external_ scripts, included via ` `, { runScripts: "outside-only" }); // run a script outside of JSDOM: dom.window.eval('document.getElementById("content").append(document.createElement("p"));'); console.log(dom.window.document.getElementById("content").children.length); // 1 console.log(dom.window.document.getElementsByTagName("hr").length); // 0 console.log(dom.window.document.getElementsByTagName("p").length); // 1 ``` This is turned off by default for performance reasons, but is safe to enable. Note that in the default configuration, without setting `runScripts`, the values of `window.Array`, `window.eval`, etc. will be the same as those provided by the outer Node.js environment. That is, `window.eval === eval` will hold, so `window.eval` will not run scripts in a useful way. We strongly advise against trying to "execute scripts" by mashing together the jsdom and Node global environments (e.g. by doing `global.window = dom.window`), and then executing scripts or test code inside the Node global environment. Instead, you should treat jsdom like you would a browser, and run all scripts and tests that need access to a DOM inside the jsdom environment, using `window.eval` or `runScripts: "dangerously"`. This might require, for example, creating a browserify bundle to execute as a ``, { url: "https://example.com/", runScripts: "dangerously", resources: { userAgent: "Mellblomenator/9000", dispatcher: new ProxyAgent("http://127.0.0.1:9001"), interceptors: [ requestInterceptor((request, context) => { // Override the contents of this script to do something unusual. if (request.url === "https://example.com/some-specific-script.js") { return new Response("window.someGlobal = 5;", { headers: { "Content-Type": "application/javascript" } }); } // Return undefined to let the request proceed normally }) ] } }); ``` The context object passed to the interceptor includes `element` (the DOM element that initiated the request, or `null` for requests that are not from DOM elements). For example: ```js requestInterceptor((request, { element }) => { if (element) { console.log(`Element ${element.localName} is requesting ${request.url}`); } // Return undefined to let the request proceed normally }) ``` To be clear on the flow: when something in your jsdom fetches resources, first the request is set up by jsdom, then it is passed through any `interceptors` in the order provided, then it reaches any provided `dispatcher` (defaulting to [`undici`'s global dispatcher](https://undici.nodejs.org/#/?id=undicigetglobaldispatcher)). If you use jsdom's `requestInterceptor()`, returning promise fulfilled with a `Response` will prevent any further interceptors from running, or the base dispatcher from being reached. > [!WARNING] > All resource loading customization is ignored when scripts inside the jsdom use synchronous `XMLHttpRequest`. This is a technical limitation as we cannot transfer dispatchers or interceptors across a process boundary. ### Virtual consoles Like web browsers, jsdom has the concept of a "console". This records both information directly sent from the page, via scripts executing inside the document using the `window.console` API, as well as information from the jsdom implementation itself. We call the user-controllable console a "virtual console", to distinguish it from the Node.js `console` API and from the inside-the-page `window.console` API. By default, the `JSDOM` constructor will return an instance with a virtual console that forwards all its output to the Node.js console. This includes both jsdom output (such as not-implemented warnings or CSS parsing errors) and in-page `window.console` calls. To create your own virtual console and pass it to jsdom, you can override this default by doing ```js const virtualConsole = new jsdom.VirtualConsole(); const dom = new JSDOM(``, { virtualConsole }); ``` Code like this will create a virtual console with no behavior. You can give it behavior by adding event listeners for all the possible console methods: ```js virtualConsole.on("error", () => { ... }); virtualConsole.on("warn", () => { ... }); virtualConsole.on("info", () => { ... }); virtualConsole.on("dir", () => { ... }); // ... etc. See https://console.spec.whatwg.org/#logging ``` (Note that it is probably best to set up these event listeners _before_ calling `new JSDOM()`, since errors or console-invoking script might occur during parsing.) If you simply want to redirect the virtual console output to another console, like the default Node.js one, you can do ```js virtualConsole.forwardTo(console); ``` There is also a special event, `"jsdomError"`, which will fire with error objects to report errors from jsdom itself. This is similar to how error messages often show up in web browser consoles, even if they are not initiated by `console.error`. As mentioned above, the default behavior for jsdom is to send these to the Node.js console. This done via `console.error(jsdomError.message)`, or in the case of `"unhandled-exception"`-type jsdom errors that occur from scripts running in the jsdom, via `console.error(jsdomError.cause.stack)`. Using `forwardTo()` will give the same behavior. If you want a non-default behavior, you can customize it in the following ways: ```js // Do not send any jsdom errors to the Node.js console: virtualConsole.forwardTo(console, { jsdomErrors: "none" }); // Send only certain jsdom errors to the Node.js console, ignoring others: virtualConsole.forwardTo(console, { jsdomErrors: ["unhandled-exception", "not-implemented"]}); // Customize the handling of all jsdom errors: virtualConsole.forwardTo(console, { jsdomErrors: "none" }); virtualConsole.on("jsdomError", err => { switch (err.type) { case "unhandled-exception": { // ... process ... break; } case "css-parsing": { // ... process in some other way ... break; } // ... etc. ... } }); ``` The details for each type of jsdom error, listed by their `type` property, are: - `"css-parsing"`: an error parsing CSS stylesheets - `cause`: the exception object from our CSS parser library, [`rrweb-cssom`](https://github.com/rrweb-io/CSSOM) - `sheetText`: the full text of the stylesheet that we attempted to parse - `"not-implemented"`: an error emitted when certain stub methods from [unimplemented parts of the web platform](#unimplemented-parts-of-the-web-platform) are called - `"resource-loading"`: an error [loading resources](#loading-subresources), e.g. due to a network error or a bad response code from the server - `cause` property: the exception object from the internal Node.js network calls jsdom made when retrieving the resource, or from the developer's custom resource loader - `url` property: the URL of the resource that was attempted to be fetched - `"unhandled-exception"`: a [script execution](#executing-scripts) error that was not handled by a `Window` `"error"` event listener - `cause` property: contains the original exception object ### Cookie jars Like web browsers, jsdom has the concept of a cookie jar, storing HTTP cookies. Cookies that have a URL on the same domain as the document, and are not marked HTTP-only, are accessible via the `document.cookie` API. Additionally, all cookies in the cookie jar will impact the fetching of subresources. By default, the `JSDOM` constructor will return an instance with an empty cookie jar. To create your own cookie jar and pass it to jsdom, you can override this default by doing ```js const cookieJar = new jsdom.CookieJar(store, options); const dom = new JSDOM(``, { cookieJar }); ``` This is mostly useful if you want to share the same cookie jar among multiple jsdoms, or prime the cookie jar with certain values ahead of time. Cookie jars are provided by the [tough-cookie](https://www.npmjs.com/package/tough-cookie) package. The `jsdom.CookieJar` constructor is a subclass of the tough-cookie cookie jar which by default sets the `looseMode: true` option, since that [matches better how browsers behave](https://github.com/whatwg/html/issues/804). If you want to use tough-cookie's utilities and classes yourself, you can use the `jsdom.toughCookie` module export to get access to the tough-cookie module instance packaged with jsdom. ### Intervening before parsing jsdom allows you to intervene in the creation of a jsdom very early: after the `Window` and `Document` objects are created, but before any HTML is parsed to populate the document with nodes: ```js const dom = new JSDOM(`

Hello

`, { beforeParse(window) { window.document.childNodes.length === 0; window.someCoolAPI = () => { /* ... */ }; } }); ``` This is especially useful if you are wanting to modify the environment in some way, for example adding shims for web platform APIs jsdom does not support. ## `JSDOM` object API Once you have constructed a `JSDOM` object, it will have the following useful capabilities: ### Properties The property `window` retrieves the `Window` object that was created for you. The properties `virtualConsole` and `cookieJar` reflect the options you pass in, or the defaults created for you if nothing was passed in for those options. ### Serializing the document with `serialize()` The `serialize()` method will return the [HTML serialization](https://html.spec.whatwg.org/#html-fragment-serialisation-algorithm) of the document, including the doctype: ```js const dom = new JSDOM(`hello`); dom.serialize() === "hello"; // Contrast with: dom.window.document.documentElement.outerHTML === "hello"; ``` ### Getting the source location of a node with `nodeLocation(node)` The `nodeLocation()` method will find where a DOM node is within the source document, returning the [parse5 location info](https://www.npmjs.com/package/parse5#options-locationinfo) for the node: ```js const dom = new JSDOM( `

Hello

`, { includeNodeLocations: true } ); const document = dom.window.document; const bodyEl = document.body; // implicitly created const pEl = document.querySelector("p"); const textNode = pEl.firstChild; const imgEl = document.querySelector("img"); console.log(dom.nodeLocation(bodyEl)); // null; it's not in the source console.log(dom.nodeLocation(pEl)); // { startOffset: 0, endOffset: 39, startTag: ..., endTag: ... } console.log(dom.nodeLocation(textNode)); // { startOffset: 3, endOffset: 13 } console.log(dom.nodeLocation(imgEl)); // { startOffset: 13, endOffset: 32 } ``` Note that this feature only works if you have set the `includeNodeLocations` option; node locations are off by default for performance reasons. ### Interfacing with the Node.js `vm` module using `getInternalVMContext()` The built-in [`vm`](https://nodejs.org/api/vm.html) module of Node.js is what underpins jsdom's script-running magic. Some advanced use cases, like pre-compiling a script and then running it multiple times, benefit from using the `vm` module directly with a jsdom-created `Window`. To get access to the [contextified global object](https://nodejs.org/api/vm.html#vm_what_does_it_mean_to_contextify_an_object), suitable for use with the `vm` APIs, you can use the `getInternalVMContext()` method: ```js const { Script } = require("vm"); const dom = new JSDOM(``, { runScripts: "outside-only" }); const script = new Script(` if (!this.ran) { this.ran = 0; } ++this.ran; `); const vmContext = dom.getInternalVMContext(); script.runInContext(vmContext); script.runInContext(vmContext); script.runInContext(vmContext); console.assert(dom.window.ran === 3); ``` This is somewhat-advanced functionality, and we advise sticking to normal DOM APIs (such as `window.eval()` or `document.createElement("script")`) unless you have very specific needs. Note that this method will throw an exception if the `JSDOM` instance was created without `runScripts` set. ### Reconfiguring the jsdom with `reconfigure(settings)` The `top` property on `window` is marked `[Unforgeable]` in the spec, meaning it is a non-configurable own property and thus cannot be overridden or shadowed by normal code running inside the jsdom, even using `Object.defineProperty`. Similarly, at present jsdom does not handle navigation (such as setting `window.location.href = "https://example.com/"`); doing so will cause the virtual console to emit a `"jsdomError"` explaining that this feature is not implemented, and nothing will change: there will be no new `Window` or `Document` object, and the existing `window`'s `location` object will still have all the same property values. However, if you're acting from outside the window, e.g. in some test framework that creates jsdoms, you can override one or both of these using the special `reconfigure()` method: ```js const dom = new JSDOM(); dom.window.top === dom.window; dom.window.location.href === "about:blank"; dom.reconfigure({ windowTop: myFakeTopForTesting, url: "https://example.com/" }); dom.window.top === myFakeTopForTesting; dom.window.location.href === "https://example.com/"; ``` Note that changing the jsdom's URL will impact all APIs that return the current document URL, such as `window.location`, `document.URL`, and `document.documentURI`, as well as the resolution of relative URLs within the document, and the same-origin checks and referrer used while fetching subresources. It will not, however, perform navigation to the contents of that URL; the contents of the DOM will remain unchanged, and no new instances of `Window`, `Document`, etc. will be created. ## Convenience APIs ### `fromURL()` In addition to the `JSDOM` constructor itself, jsdom provides a promise-returning factory method for constructing a jsdom from a URL: ```js JSDOM.fromURL("https://example.com/", options).then(dom => { console.log(dom.serialize()); }); ``` The returned promise will fulfill with a `JSDOM` instance if the URL is valid and the request is successful. Any redirects will be followed to their ultimate destination. The options provided to `fromURL()` are similar to those provided to the `JSDOM` constructor, with the following additional restrictions and consequences: - The `url` and `contentType` options cannot be provided. - The `referrer` option is used as the HTTP `Referer` request header of the initial request. - The `resources` option also affects the initial request; this is useful if you want to, for example, configure a proxy (see above). - The resulting jsdom's URL, content type, and referrer are determined from the response. - Any cookies set via HTTP `Set-Cookie` response headers are stored in the jsdom's cookie jar. Similarly, any cookies already in a supplied cookie jar are sent as HTTP `Cookie` request headers. ### `fromFile()` Similar to `fromURL()`, jsdom also provides a `fromFile()` factory method for constructing a jsdom from a filename: ```js JSDOM.fromFile("stuff.html", options).then(dom => { console.log(dom.serialize()); }); ``` The returned promise will fulfill with a `JSDOM` instance if the given file can be opened. As usual in Node.js APIs, the filename is given relative to the current working directory. The options provided to `fromFile()` are similar to those provided to the `JSDOM` constructor, with the following additional defaults: - The `url` option will default to a file URL corresponding to the given filename, instead of to `"about:blank"`. - The `contentType` option will default to `"application/xhtml+xml"` if the given filename ends in `.xht`, `.xhtml`, or `.xml`; otherwise it will continue to default to `"text/html"`. ### `fragment()` For the very simplest of cases, you might not need a whole `JSDOM` instance with all its associated power. You might not even need a `Window` or `Document`! Instead, you just need to parse some HTML, and get a DOM object you can manipulate. For that, we have `fragment()`, which creates a `DocumentFragment` from a given string: ```js const frag = JSDOM.fragment(`

Hello

Hi!`); frag.childNodes.length === 2; frag.querySelector("strong").textContent === "Hi!"; // etc. ``` Here `frag` is a [`DocumentFragment`](https://developer.mozilla.org/en-US/docs/Web/API/DocumentFragment) instance, whose contents are created by parsing the provided string. The parsing is done using a `