|
| 1 | +<!DOCTYPE qhelp PUBLIC |
| 2 | +"-//Semmle//qhelp//EN" |
| 3 | +"qhelp.dtd"> |
| 4 | +<qhelp> |
| 5 | + |
| 6 | + <overview> |
| 7 | + <p> |
| 8 | + |
| 9 | + When a character in a string literal or regular expression |
| 10 | + literal is preceded by a backslash, it is interpreted as part of an |
| 11 | + escape sequence. For example, the escape sequence <code>\n</code> in a |
| 12 | + string literal corresponds to a single <code>newline</code> character, |
| 13 | + and not the <code>\</code> and <code>n</code> characters. |
| 14 | + |
| 15 | + There are two Go escape sequences that could produce surprising results. |
| 16 | + First, <code>regexp.Compile("\a")</code> matches the bell character, whereas |
| 17 | + <code>regexp.Compile("\\A")</code> matches the start of text and |
| 18 | + <code>regexp.Compile("\\a")</code> is a Vim (but not Go) regular expression |
| 19 | + matching any alphabetic character. Second, <code>regexp.Compile("\b")</code> |
| 20 | + matches a backspace, whereas <code>regexp.Compile("\\b")</code> matches the |
| 21 | + start of a word. Confusing one for the other could lead to a regular expression |
| 22 | + passing or failing much more often than expected, with potential security |
| 23 | + consequences. |
| 24 | + |
| 25 | + Note this is less of a problem than in some other languages because in Go, |
| 26 | + only valid escape sequences are accepted, both in an ordinary string |
| 27 | + (for example, <code>s := "\k"</code> will not compile as there is no such |
| 28 | + escape sequence) and in regular expressions (for example, |
| 29 | + <code>regexp.MustCompile("\\k")</code> will panic as <code>\k</code> does not |
| 30 | + refer to a character class or other special token according to Go's regular |
| 31 | + expression grammar). |
| 32 | + |
| 33 | + </p> |
| 34 | + |
| 35 | + </overview> |
| 36 | + |
| 37 | + <recommendation> |
| 38 | + <p> |
| 39 | + |
| 40 | + Ensure that the right number of backslashes is used when |
| 41 | + escaping characters in strings and regular |
| 42 | + expressions. |
| 43 | + |
| 44 | + </p> |
| 45 | + </recommendation> |
| 46 | + |
| 47 | + <example> |
| 48 | + |
| 49 | + <p>The following example code fails to check for a forbidden word in an input string:</p> |
| 50 | + <sample src="SuspiciousCharacterInRegexp.go"/> |
| 51 | + <p>The check does not work, but can be fixed by escaping the backslash:</p> |
| 52 | + <sample src="SuspiciousCharacterInRegexpGood.go"/> |
| 53 | + <p> |
| 54 | + Alternatively, you can use backtick-delimited raw string literals. |
| 55 | + For example, the <code>\b</code> in <code>regexp.Compile(`hello\bworld`)</code> |
| 56 | + matches a word boundary, not a backspace character, as within backticks <code>\b</code> is not an |
| 57 | + escape sequence. |
| 58 | + </p> |
| 59 | + |
| 60 | + </example> |
| 61 | + |
| 62 | + <references> |
| 63 | + <li>golang.org: <a href="https://golang.org/pkg/regexp/">Overview of the Regexp package</a>.</li> |
| 64 | + <li>Google: <a href="https://github.com/google/re2/wiki/Syntax">Syntax of regular expressions accepted by RE2</a>.</li> |
| 65 | + </references> |
| 66 | +</qhelp> |
0 commit comments