Skip to content
This repository has been archived by the owner on Feb 2, 2021. It is now read-only.

csslexer lockup the browser #2037

Open
renvieir opened this issue Oct 16, 2019 · 1 comment
Open

csslexer lockup the browser #2037

renvieir opened this issue Oct 16, 2019 · 1 comment

Comments

@renvieir
Copy link

renvieir commented Oct 16, 2019

The lexCss takes too much time to perform a regex match an thus is blocking the UI.

The code snippet above is from line 239 applied to a font-family property.

const regex = /\uFEFF|U[+][0-9A-F?]{1,6}(?:-[0-9A-F]{1,6})?|url[(][\t\n\f ]*(?:"(?:'|[^'"\n\f\\]|\\[\s\S])*"|'(?:"|[^'"\n\f\\]|\\[\s\S])*'|(?:[\t\x21\x23-\x26\x28-\x5b\x5d-\x7e]|[\u0080-\ud7ff\ue000-\ufffd]|[\ud800-\udbff][\udc00-\udfff]|\\(?:[0-9a-fA-F]{1,6}[\t\n\f ]?|[\u0020-\u007e\u0080-\ud7ff\ue000\ufffd]|[\ud800-\udbff][\udc00-\udfff]))*)[\t\n\f ]*[)]|(?!url[(])-?(?:[a-zA-Z_]|[\u0080-\ud7ff\ue000-\ufffd]|[\ud800-\udbff][\udc00-\udfff]|\\(?:[0-9a-fA-F]{1,6}[\t\n\f ]?|[\u0020-\u007e\u0080-\ud7ff\ue000\ufffd]|[\ud800-\udbff][\udc00-\udfff]))(?:[a-zA-Z0-9_-]|[\u0080-\ud7ff\ue000-\ufffd]|[\ud800-\udbff][\udc00-\udfff]|\\(?:[0-9a-fA-F]{1,6}[\t\n\f ]?|[\u0020-\u007e\u0080-\ud7ff\ue000\ufffd]|[\ud800-\udbff][\udc00-\udfff]))*[(]|(?:@?-?(?:[a-zA-Z_]|[\u0080-\ud7ff\ue000-\ufffd]|[\ud800-\udbff][\udc00-\udfff]|\\(?:[0-9a-fA-F]{1,6}[\t\n\f ]?|[\u0020-\u007e\u0080-\ud7ff\ue000\ufffd]|[\ud800-\udbff][\udc00-\udfff]))|#)(?:[a-zA-Z0-9_-]|[\u0080-\ud7ff\ue000-\ufffd]|[\ud800-\udbff][\udc00-\udfff]|\\(?:[0-9a-fA-F]{1,6}[\t\n\f ]?|[\u0020-\u007e\u0080-\ud7ff\ue000\ufffd]|[\ud800-\udbff][\udc00-\udfff]))*|"(?:'|[^'"\n\f\\]|\\[\s\S])*"|'(?:"|[^'"\n\f\\]|\\[\s\S])*'|[-+]?(?:[0-9]+(?:[.][0-9]+)?|[.][0-9]+)(?:%|-?(?:[a-zA-Z_]|[\u0080-\ud7ff\ue000-\ufffd]|[\ud800-\udbff][\udc00-\udfff]|\\(?:[0-9a-fA-F]{1,6}[\t\n\f ]?|[\u0020-\u007e\u0080-\ud7ff\ue000\ufffd]|[\ud800-\udbff][\udc00-\udfff]))(?:[a-zA-Z0-9_-]|[\u0080-\ud7ff\ue000-\ufffd]|[\ud800-\udbff][\udc00-\udfff]|\\(?:[0-9a-fA-F]{1,6}[\t\n\f ]?|[\u0020-\u007e\u0080-\ud7ff\ue000\ufffd]|[\ud800-\udbff][\udc00-\udfff]))*)?|<!--|-->|[\t\n\f ]+|\/(?:[*][^*]*[*]+(?:[^/][^*]*[*]+)*\/|\/[^\n\f]*)|[~|^$*]=|[^"'\\/]|\/(?![/*])/gi;
const str = `font-family: \\00E5\\00BE\\00AE\\00E8\\00BD\\00AF\\00E9\\203A\\2026\\00E9\\00BB\\2018;`;
let m;

const startTime = performance.now();
str.match(regex);
const endTime = performance.now();
console.warn(`It took ${(endTime - startTime)/1000} seconds to execute`);

One single execution can take ~10 seconds

@renvieir renvieir changed the title csslexer lockup the browsert csslexer lockup the browser Oct 16, 2019
@codeworrior
Copy link

codeworrior commented Apr 3, 2020

We got a customer incident for the same issue (in the context of UI5).

I analysed it and think it is an example of catastrophic backtracking. The regex in the csslexer is composed of several sub-expressions, one of them for FUNCTION:

var FUNCTION = (?!url[(])' + IDENT + '[(]';

For illustration purposes, I've expanded IDENT. Then FUNCTION looks like

var FUNCTION = (?!url[(])' + '-?' + NMSTART + NMCHAR + '*' + '[(]';

The combination of the repetition of NMCHAR* and the succeeding '(' makes this a perfect candidate for backtracking. Plus, NMSTART and NMCHAR both allow alternative interpretations of digits when they occur after a backslash (either as part of a UNICODE escape sequence or as an ASCII char). Together, this builds the ground for catastrophic backtracking with exponential runtime.

Luckily, Regex: Emulate Atomic Grouping (and Possessive Quantifiers) with LookAhead describes a solution to this known issue of regular expressions.

Applying the proposed pattern to the FUNCTION sub-expression seems to fix the performance problem:

var FUNCTION = (?!url[(])' + '(?=(' + IDENT + '))\\1' + '[(]';

Further testing is needed (e.g. whether the mandatory capturing group causes negative side-effects), and cross-browser support is a topic. The lookahead ?= as well as the back reference \1 might not be supported everywhere.

openui5bot pushed a commit to SAP/openui5 that referenced this issue Apr 7, 2020
The caja-html-sanitizer uses a complex regular expression to tokenize
CSS styles before parsing and sanitizing them. That regular expression
contains a sub-expression that allows alternative interpretations of
arbitrarily long input sequences and therefore might cause extensive
backtracking (see googlearchive/caja#2037 ).

To prevent this, the corresponding sub-expression has been rewritten to
simulate an atomic group (or possessive quantifier) as described in
https://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead.

Change-Id: Ia9e8e038d8b4cf5a7cf2684a14877fc08bf1be80
BCP: 002075129500001766652020
openui5bot pushed a commit to SAP/openui5 that referenced this issue Apr 7, 2020
The caja-html-sanitizer uses a complex regular expression to tokenize
CSS styles before parsing and sanitizing them. That regular expression
contains a sub-expression that allows alternative interpretations of
arbitrarily long input sequences and therefore might cause extensive
backtracking (see googlearchive/caja#2037 ).

To prevent this, the corresponding sub-expression has been rewritten to
simulate an atomic group (or possessive quantifier) as described in
https://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead.

Change-Id: Ia9e8e038d8b4cf5a7cf2684a14877fc08bf1be80
CR-Id: 002075125800001439692020
BCP: 002075129500001766652020
openui5bot pushed a commit to SAP/openui5 that referenced this issue Apr 7, 2020
The caja-html-sanitizer uses a complex regular expression to tokenize
CSS styles before parsing and sanitizing them. That regular expression
contains a sub-expression that allows alternative interpretations of
arbitrarily long input sequences and therefore might cause extensive
backtracking (see googlearchive/caja#2037 ).

To prevent this, the corresponding sub-expression has been rewritten to
simulate an atomic group (or possessive quantifier) as described in
https://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead.

Change-Id: Ia9e8e038d8b4cf5a7cf2684a14877fc08bf1be80
CR-Id: 002075125800001440612020
BCP: 002075129500001766652020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants