JavaScript Query Engines Thursday, 9 September 2010

By Garrett Smith, with editorial input from Scott Sauyet and Andrew Paulos. Technical and editorial feedback from kangax, and a ton of feedback Diego Perini and John-David Dalton.

Most popular javascript libraries these days have a CSS Selector query engine. The concept originated from CSSQuery and was popularized by jQuery. The idea is to match elements in the DOM based on a CSS selector string.

The W3C Selectors API Level 1, a Candidate Recommendation since 2009, was started in 2006, based on CSS selectors.

CSS2 selectors have been around for over 12 years. The syntax and concepts are easy to grasp and are well known — or are they?

What's the difference between the W3C Selectors API and those found in javascript libraries? They're both based on CSS Selectors, right? Aren't they all about the same?

It turns out they're not. There are many significant differences between CSS Selectors[CSS2] and the CSS Selector query engines defined in javascript libraries.

The differences are explained and demostrated by the library examples.

When considering a Javascript library, it is important to examine the source code by review in order to make an informed decision about its quality.

Library Examples

The examples demonstrate problems primarily in jQuery, but also in YUI 2, YUI 3, Ext-JS, and Sencha. Listing every bug in each major library would have been too much cover in the already lengthy article.

CSS3 Compliance

To be CSS2 compliant, a CSS Selector Engine must follow the lexical grammar defined in the CSS2 specification to parse selector strings and perform correct matching on elements.

None of the libraries reviewed are compliant with any edition of CSS. The conformance violations are pretty obvious: Broken parsing, incorrect matching, errors thrown on CSS1 selectors and proprietary syntax extensions. These and other problems are explained below.

Although any library author is free to make any design decision he chooses, if the design decisions violate the CSS specifications and drafts (and these do), then the code cannot honestly be said to be CSS3 compliant.

For example, jQuery.com claims CSS3 Compliance and Ext-JS claims "DomQuery supports most of the CSS3 selectors spec, along with some custom selectors and basic XPath". Whether Ext supports more than half of CSS3 selectors depends on the browser; the claim of "basic XPath" support is false (possibly outdated documentation from previous editions which borrowed from jQuery).

What do the Libraries Do?

The current javascript library APIs do not adhere CSS2 selectors[CSS2]. Most implement nonstandard extensions and different behavior for standard selectors. All of them fail to implement many pseudo-classes of CSS1-CSS3. Some match properties instead of attributes for attribute selectors. They all tend to copy each other and the respective documentation of each doesn't always reflect reality. They tend to change substantially between each release, removing support for XPath, redesign with document.querySelectorAll, removing some selectors and adding others. They tend to work differently in IE, depending on the mode. The article will elaborate on cases of these things happening in javascript libraries.

Cross Browser Consistency?

A few superficial tests in this article demonstrate significant problems in the cross browser behavior of these libraries. More inconsistencies are revealed in Diego Perini's Index of CSS selector tests.

The supported browser list in jQuery includes IE 6.0+, FF 2+, Safari 3.0+, Opera 9.0+, and Chrome. However all libraries tested have results within that set of browsers that are inconsistent with the spec, inconsistent between browsers, and in the case of selectors extensions, inconsistent with other libraries.

Problems Overview

Problems in jQuery, YUI2, YUI 3, Ext, and Sencha include:

  1. Broken by Design
    • Fundamentally broken abstractions and browser inconsistency.
    • Native First Dual Approach, or NFD - Inconsistent. Variance based on NodeSelector presence/absence or errors thrown and handled with the library's fallback.
    • Syntax Extensions - nonstandard and inconsistent.
    • Incorrect Documentation.
  2. Broken Parsing
    • Fails to ignore various whitespace in attribute values, as a[name= bar\n\t] (YUI2, Ext);
    • Not throwing errors on invalid, unhandled input. either returning a result that is either empty or contains elements (All: YUI2, YUI3, Ext, Sencha, jQuery).
    • Parse multiple adjacent whitespace as multiple descendant selector. (Ext)
    • Fail to parse certain whitespace in descendant selector (YUI 2).
    • Splitting input on "," — this breaks attribute selectors where the attribute value contains a comma (e.g. "[title='Hey, Joe']"); (Ext, Sencha).
  3. Broken Matching
    • Universal selector mismatches
    • Attributes vs properties mistakes
    • Pseudo-class selectors returning every element or throwing errors
    • Case sensitivity applied to case-insensitive attributes

Problems Details

  1. Broken by Design

    The biggest problems are the design issues, this includes NFD approach and reliance on fundamentally broken abstractions. Some problems such as psuedo-class related bugs are seen in parsing and matching. These bugs cannot be as neatly fixed.

    • Fundamentally Broken Abstractions

      A fundamentally broken abstraction is an abstraction that cannot function consistently across browsers.

      The reviewed libraries' query selector engines are an example of a fundamentally broken abstraction. (Though the problems in the libraries reviewed go beyond the bugs that are seen in the query engines).

      You might be thinking something like:— "Hey, nothings' perfect, right?", or "The libraries have a large user base; they can't be that broken, can they?" or "Can't the bugs be fixed?" Those things have all been said and although fixing the bugs might seem like the right thing to do, in reality it can get complicated.

      For an example of fundamentally broken abstractions, see Sencha Touch--Support 2 browsers in just 228K!, SproutCore--over 20000 lines of new code!, and more of the library excerpts in this article below.

      Dependencies, Consistency, and Change

      Any change to a low-level abstraction propagates to its dependencies.

      Fixing bugs in a low level module creates instability which can break things (like jQuery plugins or widgets). Before attempting to fix any bug, the author must first get an understanding of the problem(s) caused by his code. In the case of a fundamentally broken abstraction, one choice may be to not attempt to fix the bug but to leave it and deprecate the method. If the bug is a core part of the library, it may be possible to refactor the library to not use that method. If that cannot be done (as is the case for the reviewed libraries) then not using the library is probably the best option.

      Most libraries that use a selector engines do so at a lower level. Each bug in library's selector engine propagates to a higher level. If the selector engine's behavior is changed, as by fixing a bug, that change is propagated to all of the higher level dependencies. Such behavioral changes cause instability. The alternative to making such changes (and causing instability) is to not fix the bugs.

      The infamous dojo.isArray is one example of a bug in a low-level abstraction that was not fixed and despite having been pointed out over many discussions over many years on comp.lang.javascript, es-discuss mailing list, and most recently on Ajaxian.com. The problems with the method are that it doesn't work cross-frame, it can return non-boolean values (0, null, undefined), and has a useless statement (typeof it == "array"). The method will, however, have consistent results across browsers.

      However, sometimes an abstraction fails to work consistently across browsers. This is often due to a limitation in a browser.

      The author who has come to recognize such problem in the code is faced with the decision to either attempt to get the abstraction working correctly across browsers, leave it alone, or fix it to work correctly in some cases, while attempting to minimize change.

      Due to the generalized nature of the function, every case cannot be addressed. The result of attempting to make it work is bloat and complexity. The code becomes difficult for the author to understand and clients of the API are confused by inconsistent results, both with various inputs to the function and with versions of the API. The paradox is that if the author does not fix the bugs, then some instability can be avoided, but at a cost of inconsistency between browsers.

      One example of this is the attributes problem in Internet Explorer 7 and below (IE8 fixed most of the issues). Generalized functions that attempt to make IE correctly read attributes are more trouble and effort than it is worth. Instead, the problems of reading attributes can be recognized as a limitation and the design of the system can avoid doing that. By avoiding doing that, the problems associated with doing that are avoided.

      The best solution for such abstractions is to know what you are doing. Do not create them in the first place.

      Any library that relies a broken query selector at the core is just as broken as its query engine. Fixing the query engine bugs causes instability and the browser inconsistencies are unacceptable.

    • Native-First Dual Approach

      The most significant query selector problem is the design approach that I am calling a native-first dual (NFD) approach. NFD creates great inconsistencies between different browsers running the same code. The approach is to first try to use document.querySelectorAll where it supported and where that is either unsupported or where calling it throws an error, a fallback selector matching engine is used.

      Because an error will happen when any proprietary selector is used, the code path taken varies, depending not just on the browser, but on the selector supplied. The library addresses these errors by wrapping every call to document.querySelectorAll in a try / catch. For jQuery, in the catch block, the query selector engine is called oldSizzle. In some cases, oldSizzle will throw an error where document.querySelector all would have returned a result, such as with :focus.

      jQuery's oldSizzle does not support the same input and standard selectors as querySelectorAll. The differences noticeable in the results of simple queries can vary widely across browsers, as seen in the examples further on. Any library that uses NFD (Ext and YUI, among those) will exhibit the same problems.

      Libraries that use NFD include jQuery, YUI 3, and Ext-js, among others. Sencha, which is related to Ext-js, uses a different approach. YUI 2 does not use document.querySelectorAll.

      Native-first Dual Approach Diagram
      <Native QSA Support?>
       Y              N
       |              |
       |              |
      [Try Use QSA]   +--[Use oldSizzle]
        |                /   |  
       <error thrown?>  /  <oldSizzle Supports Input?>
        Y             N/         Y            N
        |             /          |      [Throw error]
        |            /|          |            |           
      [Use oldSizzle] |   [perform match      |
                      |    and return result] |
                      |          |            |
                 [return result] |            |
                      |          |            |
                     END        END          END
        

      Diagram of native-first dual approach. Notice the three different possible endings.

      In addition to the code path variations, native support is buggy. For example, in Internet Explorer 8, <option selected>text</option> isn't matched by [selected] but is matched by [selected=selected].

    • The NFD approach is the most significant and fundamental mistake that a selectors library can make. It is broken by design.

      Syntax Extensions

      Some examples of syntax extensions include variations on what jQuery calls bare words attribute selectors, [att!=val], CSS Style value selectors (in Ext), and even user-defined selectors.

      A W3C-compliant Selectors engine is required to throw errors on any invalid syntax in the selector, such as those extensions defined by jQuery.

      Instead of throwing an error, jQuery interprets [att!=val] as a property selector (described below). How a library interprets the syntax extension is nonstandard, proprietary, and may vary between libraries.

      Ext provides additional syntax extensions to match style values. For example, to match all the elements whose visibility is "inherit", one would use:

        [
          Ext.query("{visibility=inherit}").length,
          Ext.query("{visibility=visible}").length
        ]
      

      That code running on the Ext test page results:

      Internet Explorer (all versions)
      18, 0
      Safari 4, Opera 10.6, Firefox 3.6*, Firefox 2
      0, 18

      The "visibility=inherit" result in Internet Explorer is an array of 18 elements, and 0 in other browsers. This is due to the fact that Ext.query relies on Ext.Dom.getStyle, which checks currentStyle in IE and calls getComputedStyle in other browsers.

      The result is only acheived when there is no trailing whitespace, as "{visibility=visible}", and not "{visibility=visible }".

      *Firefox results with plugins disabled. Some plugins such as Firebug add nodes to the document which will affect the result you see in your browser.

    • Incorrect Documentation

      The documentation for most of the libraries tends to be out of sync with what the code actually does. The most egregious offenders are Sencha and Ext-js.

      This is yet another compelling reason for anyone evaluating a library to carefully review the source code. The code explains exactly what it does. Does the code clearly reflect what is stated in the documentation?

      If, when examining the source code, it is realized that the code is written obscurely, such as using long methods with high degree of complexity, then it may be best to avoid using the library on that basis because at some point, a part of the application will inevitably need to be debugged and long, complicated methods such as those found in jQuery can be painfully time consuming to step through.

  2. Broken Parsing

    CSS2.1 defines the grammar by which tokens are matched. None of the libraries tested are compliant with that grammar. Most fail in very obvious ways. Some of the problems include:

    • Fails to ignore various whitespace in attribute values, as a[name= bar\n\t] (YUI2, Ext);
    • Not throwing errors on invalid, unhandled input.

      Matches invalid selectors ">>>", "[name=]", "[a >= 2]", "#---", either returning a result that is either empty or contains elements (All: YUI2, YUI3, Ext, Sencha, jQuery).

    • Parse multiple adjacent whitespace as multiple descendant selector. (Ext)
    • Fail to parse certain whitespace in descendant selector (YUI 2).
    • Splitting input on "," — this breaks attribute selectors where the attribute value contains a comma (e.g. "[title='Hey, Joe']"); (Ext, Sencha).

    Fails to ignore various whitespace in attribute values

    Of the tested libraries, jQuery seems to be the only one that is able parse (though not according to standard) and ignore extraneous whitespace in attribute selectors (though it fails to match attribute values properly).

    Given the HTML

      <a name="bar">
    
    And selector string:
      "a[name= bar\n\t]";
    

    YUI2 and Ext will not match the a element.

    Not throwing errors on Invalid, Unhandled Input

    All of the tested libraries allow invalid selectors such as "#---".

    YUI and Ext both fail on the descendant selector, as explained below.

    Ext and Sencha split the input on ",", and so will fail with the basic selector '[title="Hello, user"]'. Of course, it will also fail for any valid Identifier that contains an escaped , as in "#x\\,", which is a perfectly valid selector and works perfectly find when supplied as an argument to document.querySelector.

    The fallback query selector engines in javascript libraries do not follow the lexical grammar defined in CSS2. A library that accepts invalid selectors suffers more problems when it uses an NFD approach, no invalid syntax extensions may be allowed because any such allowance creates more possibility for variance (depending on browser, version, selector string, etc, see NFD above).

    Descendant Selector

    A Descendant Selector is two or more selectors separated by whitespace. Whitespace is defined in CSS as: Only the characters "space" (U+0020), "tab" (U+0009), "line feed" (U+000A), "carriage return" (U+000D), and "form feed" (U+000C) can occur in white space. Other space-like characters, such as "em-space" (U+2003) and "ideographic space" (U+3000), are never part of white space.

    YUI 2 and Ext 3.2.1 both fail on the descendant selector.

    Fail to parse certain whitespace in descendant selector

    YUI 2 fails by inconsistently throwing errors with anything other than U+0020 (space). For example, using a tab character, as in "html\u0009body" will, depending on the browser, throw an error with YUI2.

    Parsing multiple adjacent whitespace as multiple descendant selector

    Ext 3.2.1 fails by treating multiple adjacent whitespace as multiple selectors, thus:

    Ext.query("html  body"); // two spaces 
    

    - matches 0 elements, depending on the browser.

  3. Broken Matching

    • Universal selector mismatches
    • Attributes vs properties mistakes
    • Pseudo-class selectors returning every element or throwing errors
    • Case sensitivity applied to case-insensitive attributes

    Universal selector mismatches

    The universal selector, written "*", matches any single element in the document tree (CSS 2.1). The selector is broken in jQuery (see test).

    Attributes vs properties mistakes

    Attributes are string values that the browser parses from the HTML source code. Properties reflect an object's state with any value type (number, boolean, function, etc).

    Most libraries have significant problems with attribute matching, beginning with library progenitor of the confusion: jQuery. These problems are shown below in the jQuery Attributes vs Properties examples

    Psuedo-class problems

    Pseudo-class problems include returning every element or throwing errors inconsistently.

    In an NFD-based library, when the fallback is used, Pseudo-class such as :focus and :active will either return every element or throw errors. For example:

    :link[rel!=nofollow]; // force fallback with custom != selector.
    
    Ext
    TypeError: Ext.DomQuery.pseudos[name] is not a function
    jQuery:
    Syntax error, unrecognized expression: Syntax error, unrecognized expression: link
    YUI 2:
    [] // (empty result)
    YUI 3:
    Error thrown and not caught: name: TypeError, message: methodName is undefined

    These same libraries will all return a match for valid selector syntax :link in a browser that supports document.querySelectorAll because they use the NFD approach.

    The :link pseudoclass is specified in CSS to match all unvisited links. Most browsers that implement NodeSelector for :link match all links, regardless of whether or not they have been visited. This is allowed by Selectors Level 3 Working Draft and is done to prevent scripts from examining a user's history.

    Throwing errors in one browser while returning a match in another is not interoperable. It would be better to either throw an error for :link everywhere or to support :link everywhere by matching on all links.

    Case Insensitive Attribute Values Treated Case-sensitively

    Are attribute values case sensitive or case insensitive?

    The CSS2 specification states:

    The case-sensitivity of attribute names and values in selectors depends on the document language.

    In HTML 4, each attribute definition includes information about the case-sensitivity of its value. Examples of case-insensitive (CI) attribute values include INPUT element's type and name attributes and the FORM element's action and method attributes, among many others. Some case sensitive (CS) attribute values include the global id attribute, and, for the A element, the name attribute.

    Thus, [method=GeT] must match <form method='get'> while [name=Q] would match <input name="q" type="text"> and not <a name="q">.

    To add to the confusion, HTML5 defines a global case sensitivity map that conflicts with what is defined by HTML 4 for some specific element attributes. For example, HTML 5 states that NAME is case-sensitive. Contrast to HTML 4, where NAME is specified as being case-insensitive for INPUT (but case-sensitive for a).

    Browser implementations vary.

    Internet Explorer 8 and below will correctly match the INPUT element's CI NAME attribute value in a case-insensitive manner while many other browsers will not.

    For the most consistent and interoperable behavior, authors are advised to not rely on case-insensitive attribute value matching for NodeSelector but to instead supply the case in the selector string as it appears in the source markup.

    Although most libraries account for case insensitivity in element and attribute names, they do not account for case insensitive attribute values.

    While implementations vary, the javascript library query engines pass the variance right on to their callers, providing inconsistent results.

    A javascript library could provide consistent cross-browser results by either

    • supporting no attribute selectors
    • providing a case-sensitivity map.

    NWMatcher provides a case-sensitivity map. However it does not do so on a per element basis, but instead element-agnostically. NWMatcher follows the recommendation from HTML5.

    Not all browsers will follow that case sensitivity map, which is a part of a draft.

    Attribute selectors involve conflicts and interdependencies between working drafts HTML5, and CSS2.1 (PR) the official standard HTML 4.01, and conflicting implementations.

    A program using lower case attribute values except in cases for ID, CLASS, and NAME, can avoid many of the differences, however that doesn't change the problem of a javascript library that runs on a page with form name="F1" and uses a query selector [name="f1"].

    A javascript query library can avoid these problems by disallowing attribute selectors altogether. It can do this by throwing an error on any unsupported syntax. The strategy is explained below.

jQuery Selectors Quiz

Not long before writing this article, I published a quiz of 10 multiple-choice questions of simple, common selectors being applied to 9 lines of HTML. I did this after the jQuery team had tweeted about an article demonstrating invalid selector syntax using jQuery's "bare words" attribute/property selector to match property values instead of attributes. The confusion that jQuery has helped propagate was to blur the distinction between properties and attributes. That confusion was shown in the article, which had espoused such invalid techniques and which jQuery had endorsed.

Tweet

Some Good and Advanced jQuery Techniques - http://bit.ly/dli5EN 9:01 AM May 9th

It seems surprising that the jQuery team would promote that, especially light of all the attention that has been paid to the broken ATTR function (explained below). My response to that was to question the reader of what that code actually does.

For each correct answer submitted, the quiz-taker is presented with the explanation of why the answer is correct. At the bottom of the quiz are two example documents that display the results of each question.

Example HTML from Quiz

1.   <img width="600" src="logo.gif" id="imageOne" style="display: none" alt="Write less">
2.   <img src="logo.gif" id="imageTwo" alt="Do More!">
3.   <img width="100" src="logo.gif" id="imageThree"  alt="Write less">
4.   <img width="0" src="logo.gif" id="imageFour"  alt="Write less">
5.   <input type="image" width="600" src="logo.gif" id="inputOne" alt="Write less">
6.   <input type="image" width="100" src="logo.gif" id="inputTwo" alt="Do More!">
7.   <input type="image" src="logo.gif" id="inputThree" alt="Write less">
8.   <input type="image" src="logo.gif" width="0" id="inputFour" alt="Write less">

9.   <pre>-</pre>

Test results (standards, quirks).

No tricky edge cases, however as shown in the test, the answers vary between browsers, with surprising results.

The jQuery-tweeted article espouses the use of $('img[width=600]') to get "All the images whose width is 600px". That's different from the W3C Selectors API (draft)[SELECT] specifies.

If the attribute's value were quoted, as img[width="600"], then standard behavior for that query should match img elements whose width attribute is exactly the value "600", never mind if has been rendered at 600px.

In contrast, the Selectors API[SELECT] specifies that an error should be thrown when invalid syntax is supplied. Since 600 is neither a string nor an identifier, the entire selector is invalid. A compliant Selectors implementation must throw an error with that.

The Selectors API Level 1[SELECT] states:

If the given group of selectors is invalid ([SELECT], section 13), the implementation must raise a SYNTAX_ERR exception.

CSS 2.1[CSS2] states:

Attribute values must be identifiers or strings.

And also:

When a user agent cannot parse the selector (i.e., it is not valid CSS2.1), it must ignore the selector and the following declaration block (if any) as well.

Selector Extensions and CSS3 Compliance

Many selector libraries do not throw an error when given invalid selector syntax. Instead, the library interprets the invalid selector as a property selector (described below).

In the case of jQuery being passed an attribute selector, the ATTR function is used to match a property value.

Other libraries will do different things. Some may match attribute values while others do not. None of the javascript libraries are CSS3 compliant.

Before looking at the results of how jQuery handles attribute selectors, some definition of terms is in order.

Attribute Selectors

Standard CSS 2.1 attribute selectors match attributes defined in the source document. Any attribute value must be either a string or an identifier. In CSS2.1, a string is delimited either by single or double quote marks and an identifier is defined:

CSS Identifier

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A1 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F".

The definition is unfortunately looser than what is defined by the lexical grammar of CSS, which disallows identifier beginning with a hyphen followed by a hyphen, however the libraries don't match either definition (see also CSS WG bug #174).

CSS identifier is also used in class and ID selectors.

CSS 2.1 defines four attribute selectors:

[att]

Match when the element sets the "att" attribute, whatever the value of the attribute.

[att=val]

Match when the element's "att" attribute value is exactly "val".

[att~=val]

Represents an element with the att attribute whose value is a white space-separated list of words, one of which is exactly "val". If "val" contains white space, it will never represent anything (since the words are separated by spaces). If "val" is the empty string, it will never represent anything either.

[att|=val]

Represents an element with the att attribute, its value either being exactly "val" or beginning with "val" immediately followed by "-" (U+002D). This is primarily intended to allow language subcode matches (e.g., the hreflang attribute on the a element in HTML) as described in RFC 3066 ([RFC3066]) or its successor. For lang (or xml:lang) language subcode matching, please see the :lang pseudo-class.

CSS3 Attribute Selectors

[att*=val]

Match element whose "att" attribute value contains the substring "val".

E[foo^="bar"]

Match element whose "att" attribute value begins with the string "val".

E[att$="val"]

Match element whose "att" attribute value ends with the string "val" .

Property Matching

Dynamic object properties can be of any value and reflect the object's state. Matching attribute selectors against properties is nonstandard. This is what jQuery does most of the time.

jQuery Attribute (Property) Selector Syntax Extensions

jQuery defines additional nonstandard extensions, for example, an incomplete list of just two:
[att!=val]
Represents an element whose property att is either undefined or is not val.
:animated
Select all elements that are in the progress of an animation at the time the selector is run.

The :animated selector is inherently coupled to jQuery.

Other javascript libraries copy some of the jQuery selectors but implement them differently. Rather than trying to match against property values, the other libraries match against attribute values in more cases, though still often matching properties in MSIE.

What Does jQuery Do?

jQuery does what the blog article says it does. Well, in a few browsers, and depending on the rendering mode and the CSS that has been applied to the elements. What jQuery does varies widely across browsers.

Bare Words Attribute Values Test

jQuery bare words attribute selector performs property matching in the examples in the article.

'img[width=600]'
Opera 10.5
    imageOne
    imageTwo
Firefox 3.6, Firefox 2, Safari 4, Chrome 4:
    imageOne
    imageTwo
    imageThree
    imageFour
IE6 and IE7 (standards mode), IE8 (EmulateIE7)
(empty result)
IE6 and IE7 (quirks mode), IE8 and IE9 (either mode)
    imageTwo
    imageThree
    imageFour

Cross Browser Results Analysis

The results above show inconsistent results from recent versions of browsers that jQuery supports.

In fact, in IE8 alone, jQuery can result in three possible different results. This is because in IE8, NodeSelector is unavailable in both quirks mode and IE7 mode. Property values can vary between those modes. This leaves the possibility for jQuery attribute selectors to match attributes, or one of two different property values, depending on if the document is in quirks mode.

Had the selector's attribute value been a string (surrounded by quotation marks), as img[width='600'], then following the Selectors API, it must match all img elements whose width content attribute is exactly the value "600".

However, because jQuery uses querySelectorAll first (NFD), img[width='600'] would match img with attribute "600" and for browsers that lack querySelectorAll, will match img elements whose width property is 600.

jQuery Property Matching Example

$("body[ownerDocument]", "html").length

All tested browsers:

    1
$("html body[ownerDocument]").length
Safari 4, Firefox 3.6, IE8, 9, Chrome 4, Opera 10
0
Firefox 2, IE6 and 7 (either mode), IE8 and 9 (quirks mode)
1

The example shows:

  1. jQuery performs property matching of ownerDocument in some browsers
  2. when a context parameter is passed, the property matching occurs in all tested browsers

It is a bad idea to try to read ownerDocument this way, however some might actually think it is a good idea to try to read an input's checked property. — a classic mistake, and one which unfortunately made it into the core of jQuery.

Attributes vs Properties

The basic difference between attributes and properties are that attributes are string values that the browser parses from the HTML source code and properties reflect an object's state with any value type (number, boolean, function, etc).

jQuery has never handled attributes properly[1][2][3][4][5][6]. jQuery is designed in such a way that does not clearly distinguish attributes from properties. The most common versions of Internet Explorer have this same problem.

jQuery/Sizzle ATTR Matcher

The source code for Sizzle shows how object properties, before attributes, are matched.

ATTR: function(elem, match){
    var name = match[1],
    result = Expr.attrHandle[ name ] ?
        Expr.attrHandle[ name ]( elem ) :
        elem[ name ] != null ?
         elem[ name ] :
         elem.getAttribute( name ),
        value = result + "",
        type = match[2],
        check = match[4];

The line:

elem[ name ] != null ? elem[ name ]

- checks to see if the element's property is either null or undefined. If that is the case, getAttribute is used as a fallback.

It would seem to make more sense to use elem.getAttribute instead, however, that would still leave behind problems with MSIE's completely broken implementation of attributes, prior to IE8.

Using getAttribute(att, 2) for IE cannot be used safely because IE throws errors in some cases with that and returns wrong values, such as strange numbers for values of boolean attributes (MSDN).

input.getAttribute("disabled", 2); // Result number 0 in IE.

Properties as attributes appears to have been a fundamental design oversight in early jQuery. Changing the method to use a strategy to resolve attribute values would change behavior with programs that use of jQuery, jQuery UI and any and all plugin dependencies.

So while changing ATTR to match attributes would make sense, it would not be practically possible in IE (due to bugs in IE). IE bugs aside, changes to ATTR would result in a substantial change propagation to any and all dependencies. The problem cannot easily be fixed, as jQuery is a public API and public APIs are forever.

jQuery applies attribute selectors to match object properties, but where querySelectorAll is implemented, and an error is not thrown, jQuery resolves attributes.

Attribute values and properties are completely different things. Performing attribute selector matching by testing elements' property values, as jQuery does, is a significant deviation from the way standard attribute selectors work.

CSS 2.1 Identifier: Type, ID, and Class Selectors

The CSS production for Identifier is used not only for attributes, but also Type (element name), ID, and class selectors.

Type Selector

A type selector matches the name of a document language element type. For example, "html", would match the HTML element in an html document. "2" would result in an error because it is not an identifier. However, jQuery("2") will use its native-first dual approach, which throws the expected error, which it then catches, and then falls back to oldSizzle.

document.querySelectorAll("2"); // Error.
jQuery("2");// Result of 0 objects matched.

ID Selector

The ID selector is "#" followed by an identifier; B&W? is not an identifier and so #B&W? is not a valid ID selector. However in jQuery, the production for identifier is not matched; here again, jQuery will use its native-first dual approach, which throws an error, catches that, falling back to oldSizzle.

document.querySelectorAll("#B&W?"); // Error.
jQuery("#B&W?"); // Result of 0 objects matched.

Class Selector

The Class selector is "#" followed by an identifier.

document.querySelectorAll(".B&W?"); // Error.
jQuery(".B&W?"); // Result of 0 objects matched.

Most libraries have problems with these selectors.

Ext-JS documentation uses invalid syntax and misleads the reader by falsely stating:

The use of @ and quotes are optional. For example, div[@foo='bar'] is also a valid attribute selector.

No! The @ in an attribute selector is not a valid CSS Selector Quotes are not optional for CSS selectors. Omitting quotes in Ext (the big one) may result in an error being thrown, depending on if the attribute value is an Identifier and when an error is thrown, the fallback is used.

The same documentation is used for Sencha, and when and invalid query is passed in Sencha, the result is a javascript error. An XPath attribute selector using [@foo='bar'] would cause an error to be thrown in any browser. The difference with Sencha is that error is not caught; no fallback is provided.

Other Libraries: A Peek at YUI, Ext-JS, and "My Library"

YUI 2

YUI 2 supports some jQuery extensions, but for other extensions, and even for some standard CSS selectors, it returns wrong results. For example, ":link" and ":disabled", return every element in the document.

YUI 2 supports only U+0020 white space in selectors and throws errors on anything else. Mootools 1.2 has the same problem, throwing an error if the selector contains whitespace it can't recognize (such as tab). Contrast to what is specified for whitespace in CSS2.

What's Next?

Upon learning that the library creates more cross browser problems than it solves, the next logical step for the library user should be to stop using it, to remove it, and not to jump blindly to another library.

The next step for developers realizing that their query library is failing to live up to what was promised is to learn how to create reusable, forwards-compatible abstractions that follow standards and work consistently across browsers.

For Library Users and Management

There is no substitute for knowledge (Deming). One who wants to build RIAs must read all of the pertinent specifications and all of the pertinent browser documentation. He should have a good foundation of OO principles and methodologies.

The following advice is offered to the reader:

  • RTM - (ECMA, CSS2.1, Selectors, DOM 2 HTML, HTML4, HTML5, also MDC and MSDN).
  • Test across many browsers, including older browsers, to test degradation paths.
  • Get code reviews
  • Ask smart questions

Everyone in the company should be focused on the successful production of a quality product.

If management is making decisions about javascript (libraries or otherwise) and they are not technically qualified to make technical assessments of quality, then they are effectively hurting the company and they need to stop doing that.

For Library Authors

Reusable abstractions are useful to fulfill software requirements quickly. The concept of javascript library must evolve beyond the current state.

Independent, Cohesive parts

The libararies reviewed are highly interdependent.

The javascript programming language allows interface-based design without having to create an actual interface. For example, a method `elementHasClass` is needed, then that method could easily exist independently, and such methods do exist in YUI, for example. There should be no need to depend on the concretion of entire YUI core, just that method and whatever it depends upon.

An interface does not need to "change the way you write javascript. That would be the task of an IoC-type framework.

An interface does not promote dependence on one Big Thing. An interface should do one thing.

Fundamentally Broken Abstractions

A fundamentally broken abstraction cannot function consistently across browsers in all contexts. What can be done about such problems?

One approach is to try to make the abstraction work in all contexts. A few extreme examples of that are in the "APE.dom.getOffsetCoords" function I wrote several years back. Another is David Mark's attribute reading function attr.

As seen in these examples, modules that have more browser-differences workarounds become increasingly complex, have more edge cases, and are and much harder for humans to digest.

To avoid having a linear dependence on any abstraction the object that is using the abstraction can be configured so that the abstraction that it is using can be switched to another abstraction with the same interface.

Speculative Generality

An interface that is built for "users" tends to lead to too much generality, as seen in the popular libraries, burgeoning with fetures, complexity, and bugs.

An interface that addresses differences in browsers should follow standards and use feature testing to derive strong inferences about the client environment and should limit what it does to the least capable environments.

Code Review

Blind acceptance of library code caused the Ajax library problem. Too many awful APIs, too much misinformation and the result is a catastrophe.

To avoid the mistakes, libraries will need more peer review, and that starts with you, reader. The next time you want to evaluate a library, look carefully at the source code.

Technical management (at least in America) thinks that they can get away with copying what everybody else is doing but they don't realize that this is hurting quality. Ajax development has become an extreme case of the blind leading the blind.

Focus on Quality

Teams that focus on short term costs and "getting things done" sacrifice quality. Sacrificing quality creates technical debt. Technical debt hurts quality and increases the effort (and cost) of maintaining the code.

The most important step that a company can take is to focus on fulfilling its goals with quality solutions. A company that focuses on quality will, in the long run, reduce the costs of production.

Quality code comes from knowledgable developers and / or teams. Developers who learn to value quality will show greater pride of craftsmanship. This will create a reprocussive effect of improved quality within the development team and within the company.

You can not afford to do things wrongly.

Cross Browser Abstractions - Wrapper with a Fallback

What follows is a strategy for developing a consistent interface, limited by the least common denominator.

The strategy uses a mixture of standard features, where those are available, and compatible fallbacks where they are missing or found buggy (by capability tests).

Although the example of the concept is about Selector queries, the conceptual pattern and strategy itself is applicable to many situations.

Filter the Input

A query selector that behaves consistently across browsers must verify that all inputs behave consistently across all known implementations.

The library can decide which selectors will be unsupported and filter them out. Some selectors, such as :visited, are not possible to implement and not very useful anyway. Supporting attribute selectors for IE is more trouble than it is worth.

Using NodeSelector

If the library chooses to use NodeSelector, then it must follow the specification to the letter. It must not allow any invalid selectors. It must not extend the CSS2 selectors syntax.

Capability Tests

To determine if the browser provides a sufficient implementation of document.querySelectorAll, perform capability tests to check for not only existence of document.querySelectorAll, but known problems with the supported selectors[13].

For example, a a CSS1-compliant query selector engine could employ a strategy where if the selector did not match a validity constraint, then an error would be thrown.

function makeQuery(selector, doc) {
    if(!isValidSelector(selector)) {
        throw new InvalidSelectorError(selector);
    }
    doc = doc || document;
    if(IS_QSA_SUPPORTED) { 
        return doc.querySelectorAll(selector);
    } else {
        return makeQueryFallback(selector, doc);
    }
}

Consistently throwing an error for unsupported selectors avoids the inconsistencies seen with the native-first dual approach.

The library function could safely use doc.querySelectorAll as a fallback, so long as the known implementations consistently support all of the selectors that the library supports[12]. And, in case you didn't notice, this strategy will work cross-frame, unlike every other selector engine.

  <Is input Supported?>
   |           |
   Y           N- [Throw Error] -END| 
   |              
  <Native QSA Support?>
   |               |
   Y -[Use QSA]    N - [Use falback]

This is somewhat similar to the strategy used by Dojo, which documents that some selectors are unsupported.

The Value of Queries

The selector APIs in the several prominent javascript libraries reviewed are so broken that they obviously cannot be relied on.

The value of selectors that work as specified by the specifications has not been established.

There are several alternatives to using selectors. Anyone, though especially those who are using a broken selectors API, should question how much value the abstraction provides. He should compare that value, positive or negative, to the alternatives.

Drawbacks to Queries

The program design approach of using DOM traversal to select nodes and then performing an action on one or more of them is usually much less efficient than standard alternatives.

When DOM traversal is performed on page load, it can increase page load time, especially if the action inside the loop triggers a recalc (also commonly called "reflow"). For a large document, this can cause the page to become unresponsive, as seen on the WHATWG HTML5 draft specification "full version" [14]. However even a small delay (200ms) adds to the overall page load time (a critical performance moment). Such performance issues will have a greater performance impact on slower systems, not fast developer systems.

Query Matching Strategy

Most usage of queries don't allow for common traversal patterns of finding an ancestor. Such traversal pattern is often needed when using event delegation strategies, where the callback needs to know find an ancestor matching a particular criteria, usually either ID, className or tagName.

var sel = new Selector("ul.panel");

function clickCallback(ev) {
  var target = DomUtils.getTarget(ev);
  if(sel.test(target)) {
    panelListClickHandler(ev);
  }
}

To handle this functionality, the Selector.test method could use Element.matchesSelector(txt) (after capability testing, of course). This is implemented in Gecko as Element.mozMatchesSelector and in webkit as Element.webkitMatchesSelector.

Since selector traversal and parsing is slower, another alternative would be to support only simple selectors but without attributes, so limiting to type (element), class, and ID.

Alternatives

The "find something, do something" approach has efficient alternatives alternatives.

If the "do something" action is adding an event handler to various nodes, then that can action be replaced by using event delegation. This is done by adding an event listener to a common ancestor.

If the "do something" action is modifying styles, then the script can add a className token to a common ancestor of the matched nodes, allowing the browser to apply the cascade to descendant nodes. An example of this is linked from the design section of the code guidelines for comp.lang.javascript.

Conclusion

The current javascript library APIs do not adhere the CSS2 specification for selectors.

Library documentation for selectors often does not differentiate between standard selectors or nonstandard extensions. For libraries that use the NFD approach, query results vary widely not only between browsers, but even in the same browser, depending on something as trivial as an unquoted attribute value.

The native-first dual approach does not normalize browser behavior. Instead, amplifies the differences between browsers that have native support and those that don't.

Design problems are not limited to the query engines, but include other parts of the library and extend to their dependencies. Other such design problems seen in libraries include browser detection, fake method overloading, and useless methods that don't do what their name indicates. All at a cost of increased bytes and instability.

Any javascript developer who uses jQuery, YUI, Ext-JS, or Sencha either has not read the source code enough, is not capable of understanding the problems, or has read and understood the problems but has not appreciated the consequences deeply. The use of any one of these libraries is a substantially irresponsible and uninformed decision that puts quality and long-term success of his project at risk.

Today's Ajax libraries are interdependent monoliths that promise what is not practically possible. The problems with javascript libraries can be avoided by favoring simple interface-based design that avoids browser issues.

References

Specifications and Drafts

Other References

JQuery Issues and Discussions

Other Javascript Library Discussions

NodeSelector Bugs and Discussions

Tests

Posted by default at 12:15 AM in JavaScript

 

 

Comment: Diego Perini at Thu, 9 Sep 7:53 AM
Garrett,
really great and in-depth writing on the subject.

Not only you wrote the best article about the state of selector engines in libraries but you also pointed out the technical aspects of the problems with examples, explained current misinterpreting of specifications and offered indications/solutions on how to better approach these problems.

Hope libraries authors will pick up your heads-up.

Again, my compliments for the article.
Comment: Thomas Aylott at Thu, 9 Sep 1:49 PM
TLDR :P

j/k

I'll have to set aside an afternoon to read this epic tome. Though, I might need to take a nap halfway through.

Any plan to release it in volumes? Maybe breaking it up into chapters would help.
Comment: Dean Edwards at Thu, 9 Sep 3:11 PM
I'd just like to point out that base2 is another library that does not suffer these problems. Diego and I have been competing recently to produce the most accurate selector engine. :)
Comment: Thomas Aylott at Fri, 10 Sep 9:34 AM
When I was doing some research into all the most used js selector engines I found that very few do any of their own testing and that the few tests they do write aren't covering enough.

That is why we decided to write our own implementation agnostic selector engine test suite. The SlickSpec http://mootools.net/slickspec/SlickSpec/

MooTools brought the world the SlickSpeed selector comparison tool which helped engines compete on speed, and now we bring you the SlickSpec to help engines compete on accuracy. 

The SlickSpec combines all the (non-broken) tests from all engines test frameworks and quite a few more we cooked up. It runs the tests against any selector implementation you care to throw at it. It is not limited to only a small subset of the selectors you "should" use, it tests custom selectors and syntax extensions too.

It runs the test suite in all document types: strict, quirks, semi-standard, XML, XHTML as XML, Ajax XML DOM, SVG, iframe document, dom fragment, orphaned tree segment and maybe more, I forget.

We also forked MooTools' Selectors.js and made major changes to it. We created a standalone CSS Selector Parser that has it's own extensive unit tests. We now support XML documents, orphaned trees and all that jazz.

Slick.js will be released alongside MooTools 1.3 very soon.

The new era of engines competing on accuracy has begun.
Comment: David Mark at Fri, 10 Sep 2:55 PM
I'd like to point out that My Library suffers from less such problems (though I know it isn't perfect). The more egregious missteps (e.g. properties vs. attributes) are certainly avoided. And it doesn't sniff browsers either. ;)

http://www.cinsoft.net/slickspeed.html

With regard to the "major" libraries, it's quite a horror show; isn't it?

Regardless, as I've always said, CSS selector queries are a complete waste of time. The last thing you want to do is to dump another layer of (unreliable) abstraction on top of the already complex task of cross-browser scripting. They don't save time or money; though they do create jobs for incompetent developers. :(
Comment: David Mark at Fri, 10 Sep 3:01 PM
At a glance, you've got a couple of typos. "Depening" and the name test comparison appears to be mistyped (something about IE8 matching a name attribute case sensitively).

And I don't see any "peek" at My Library. (?) As mentioned, the query engine is not perfect (and I doubt I'll ever bother to make it perfect). Though as I've been pointing out for some time, it beats the hell out of jQuery (and the rest of the "majors") in every way imaginable (and not just for queries). :)
Comment: Garrett at Sat, 11 Sep 2:08 PM
@Thomas, trimming/editing may be a good idea.

I've gotten a ton of helpful feedback from Diego and JDD (literally several pages of emails), and I'm working on incorporating the added suggestions having trouble updating the entry with changes. Seems to be that that entry is pushing up against VARCHAR limit of 65,535 bytes!

http://dev.mysql.com/doc/refman/5.1/en/char.html

Omitting some end tags which are optional in HTML 4 caused my XHTML site to not validate, and I just don't have enough room to add those closing LI tags. There is still gibberish and spelling errors and the explanations aren't great. It's just too damned long, I know.

Agreed that library selector engines' test coverage and even the official w3c selectors API test [SELTEST] coverage is inadequate. Thanks to its respective authors, but the official test needs to be expanded to cover each section of the selectors API as it pertains to HTML, including various flavors you mention, and possibly SVG.

A JS selectors API interface that does not use NFD can safely omit features of the Selectors API that would be awkward or provide little value. Some things are just impossible to match (:visited, for example) and other things are so troublesome that the added complexity isn't very well justified by the functionality (attribute selectors).

CSS1 Simple selector functionality could be somewhat beneficial because it can be used with the "Query Matching Strategy" mentioned in the article. This can be super useful for delegation strategies, such as to answer questions, e.g. "Is click event inside panel?" and the current Selectors API doesn't provide for this functionality.

A CSS1 simple selector parser can be implemented in very little code. The matching for the selector parser can be set to test ID, tagName, and className. That doesn't provide for attribute matching but it avoids the problem with dealing with IE attribute problems.

Though <code>matchesSelector</code> could be used, it is more awkward and not as efficient as creating a and saving a Selector object which a hand-rolled fallback can mimic.

http://www.mail-archive.com/public-webapps@w3.org/msg05467.html

Since matchesSelector isn't implemented, one could use <code>mozMatchesSelector</code> and <code>webkitMatchesSelector</code> and that requires yet more feature tests to determine which method name to use, so...

if(el && el[SELECTOR_MATCH_METHOD] && el[SELECTOR_MATCH_METHOD](str)) {
// ...
}


Compared to what was proposed years ago:

if(mySelector.test(el)) {
// evaluates to false if el is not an element.


David Anderson argued for this feature back in 2006 and some of those ideas have made it into Diego's selector API (and possibly Dean's though I've not looked at that yet).
Comment: Diego Perini at Mon, 13 Sep 7:57 AM
Dean & David,
maybe non cited libraries do not have these problems,
you should appreciate not being listed in bug reports !
;-)
Comment: David Mark at Wed, 15 Sep 12:57 AM
@Diego

As mentioned, I know My Library's query module has at least some of these issues (just far fewer than scripts like jQuery). Bug reports are always welcome as they may cover problems I don't know about. :)

 

*AnimTree
*Tabs
*GlideMenus
*DragLib