Questions tagged [regex]

Regular expressions provide a declarative language to match patterns within strings. They are commonly used for string validation, parsing, and transformation. Specify the language (PHP, Python, etc) or tool (grep, VS Code, Google Analytics, etc) that you are using. Do not post questions asking for an explanation of what a symbol means or what a particular regular expression will match.

IMPORTANT NOTE: Requests to explain a regular expression pattern or construct will be closed as duplicates of the canonical post What does this regex mean which contains a lot of details on regular expression constructs. The post also contains links to many popular online regular expression testers (where the meanings of regex constructs can be found). One such tool is Regex101.

Regular expressions are a powerful formalism for pattern matching in strings. They are available in a variety of dialects (also known as flavors) in a number of programming languages and text-processing tools, as well as many specialized applications. The term "Regular expression" is typically abbreviated as "RegEx" or "regex".

Before asking a question here, please take the time to review the following brief guidelines.

How To Ask

  • Specify what tool or language you are using

    Regexes are everywhere. Different languages like Python, PHP and Java all use regexes, but with minor differences. Many different tools use regexes as well, from grep to most text editors to Google Analytics, also with their own differences. Specify the tool or language in your question. (Perhaps see also Why are there so many different regular expression dialects?)

  • Be clear about what you need.

    Keep in mind that regex dialects are different; the lowest common denominator will usually be quite different from what is possible and recommended for a tool with a modern, souped-up regex engine. (See previous section.)

    Also, are you looking for a regular expression for input validation (which needs to be rather strict), or do you need one for information extraction (which can be somewhat relaxed)?

    If your question relates to regular expressions in the strict computer science/automata theory sense, please state this explicitly.

    For most other questions, you should always include sample input, expected output, and an outline of what you have tried, and where you are stuck. Often, an example of what you do not want to match is also very helpful, and important to know.

  • Show us what you tried.

    A link to one of the many online regex testing tools (see link section) with your attempt and some representative data can do wonders.

    However, keep in mind, again, that there are many different regular expression dialects. (See earlier bullet points.) A result from an online tool for JavaScript or PHP does not necessarily work in Python or Java or sed or Awk or ... what have you.

    Even if you cannot post your problem online, showing us your best attempt helps us focus on what you need help with.

  • Search for duplicates.

    Before posting, check if your issue has already been solved by somebody else asking something similar. See also the following section.

Avoid Common Problems and Pitfalls

There are some common recurring beginner topics.

  • Do not assume that the tool you are using supports precisely the syntax of another tool.

    While modern Perl/Ruby/Python/PHP/Java regular expression support is widespread, you cannot assume that it is universal. In particular, many older tools (Awk, sed, grep, lex, etc.), as well as some newer ones (JavaScript, many text editors), use different dialects, some of which do not necessarily support e.g. non-capturing parentheses (?:...), non-greedy quantifiers *?, backreferences (\1, \2, etc), common character class abbreviations (\t, \d, POSIX character classes [[:class:]]), arbitrary repetition {m,n}, lookaheads (?=...), (?<=...), (?!...), etc. etc.

    If your question is not specific to any particular implementation, try the tag. This will generally imply a fairly minimal set of operators, corresponding to the ones specified in the common mathematical definition of regular languages.

  • Understand the difference between "glob" expressions and true regular expressions.

    Glob patterns are a less potent pattern matching language, which is commonly used for file name wildcards. In glob, * means "anything", while a lone * in a regular expression is, in fact, a syntax error in some dialects (though many engines will silently ignore it, rather than issue a warning; and others still will see it as a literal *).

    For the record, the regex way to say (as much as possible of) "anything" is .* where the "any single character (except newline, usually)" . metacharacter is repeated zero or more times (*). But see below about how "any character" and greediness is sometimes problematic.

    See also What are the differences between glob-style patterns and regular expressions?

  • Specifying a single repetition is unnecessary.

    Using {1} as a single-repetition quantifier is harmless but never useful. It is basically an indication of inexperience and/or confusion.

    h{1}t{1}t{1}p{1} matches the same string as the simpler expression http (or ht{2}p for that matter) but as you can see, the redundant {1} repetitions only make it harder to read.

  • Square brackets are commonly misunderstood or misused.

    Beginners often attempt to use square brackets for everything, including grouping. While [Jun][Jul] may look like a regex for matching months, it actually matches JJ, Ju, Jl, uJ, uu, ul, nJ, nu, or nl; not Jun or Jul. [Jun|Jul] is a wasteful way to write the functionally identical [|Junl]—it matches any one character from the set comprising |, J, u, l, and n.

    For the record, [abc] defines a character class which matches a single character which can be a or b or c. The proper way to express alternation is (Jun|Jul|Aug) in many dialects (though BRE and related dialects will need backslashes; \(Jun\|Jul\|Aug\) for traditional grep et al.) or, somewhat more parsimoniously, (Ju[nl]|Aug). The round parentheses (as opposed to the square brackets of character classes) perform grouping, and the | operator indicates matching alternatives.

    See also What is the difference between square brackets and parentheses in a regex?

  • Negation is tricky.

    Related to the previous, beginners will use negated character classes to attempt to restrict what can be matched. For example, to match turn but not turned, the following does not do what you want: turn[^ed] -- it will match turn followed by any single character which is not e or d (so it will not match turner, for example).

    In fact, the traditional regex does not allow for this to be expressed easily. With ERE, you could say turn($|[^e]|e$|e[^d]) to say that turn can be followed by nothing, or a character which is not e, or by e if it is not in turn followed by d. Modern regular expression dialects have an extension called lookarounds which allow you to say turn(?!ed)—but make sure your tool supports this syntax before plunging ahead.

    Notice also how the character class negation operator is distinct from the beginning of line anchor (^[abc] matches a, b, or c at beginning of the line, whereas [^abc] matches a single character which is not a, b, or c).

    See also the next bullet point.

  • If there is a way to match, the engine will find it.

    A common beginner's mistake is to supply useless optional leading or trailing elements. The trailing s? in dogs? does nothing to prevent a match on doggone or endogenous. If you want to prevent those, you will need to elaborate—perhaps something like dogs?\> (provided your dialect supports the final word boundary operator and provided that's what you mean).

    As it is, the regular expression dogs? will match exactly the same strings as just dog (though if your application captures the match, only the former will capture a trailing s if there is one).

  • Matches are greedy.

    The regex a.*b will match the entire string "abbbbbb" because * will always match as much as possible. Say a[^ab]*b if that's what you mean, or use non-greedy matching if your dialect supports it.

  • Watch what you capture

    If you use grouping parentheses, the parentheses define what is captured into a backreference. If you edit in parentheses for grouping purposes, make sure you are not renumbering your backreferences.

    Also, in particular, watch out for (abc){2,3} which only captures the last occurrence of abc in the matched string. If you want the repetition to be part of the capture, it needs to be inside the parentheses, like this: ((abc){2,3})

  • Don't use regex for everything!

    In particular, using (typically line-oriented) traditional regex tools to handle structured formats like HTML, XML, JSON, configuration files with block structure (Apache, nginx, many name servers, etc.) is likely to fail, or to produce incorrect results in numerous corner cases.

    Asking for HTML regexes tends to be met with negative reactions. The reasoning extends to all structured formats. If there is a parser for it, use that instead.

Further Reading

Learning regular expressions


Documentation for JavaScript

Online sandboxes (for testing and publishing regexes online)

  • RegexPlanet (supports a variety of flavors to choose from)
  • Regexpal (ECMAScript flavor, as implemented by JavaScript)
  • Regexhero (.NET flavor)
  • (.NET flavor with link sharing capability)
  • RegExr v2.1 (in JavaScript)
  • RegExr v1.0 (ECMAScript flavor, as implemented by Adobe Flash)
  • Rubular (Ruby flavor)
  • (Java-applet with source code)
  • (German; probably Java flavor)
  • regex101 (in ECMAScript (JavaScript), Python, PHP (PCRE 16-bit), Golang, Java, generates explanation of pattern)
  • (generates graphical representation for ECMAScript flavor)
  • debuggex (generates graphical representation and shows processing of pattern – JavaScript, Python, and PCRE-compatible)
  • (Web validator for Python regular expressions)
  • (Visual debugging of regular expressions for JavaScript)
  • Ultrapico Expresso (a standalone tool for testing .NET regular expressions)
  • Pythex (Quick way to test your Python regular expressions)

Online Regex generator (for building Regular Expressions via simplified input)

Other links

Regex Uses:

Regular expressions are useful in a wide variety of text processing tasks, and more generally string processing, where the data need not be textual. Common applications include data validation, data scraping (especially web scraping), data wrangling, simple parsing, the production of syntax highlighting systems, and many other tasks.

While regular expressions would be useful on Internet search engines, processing them across the entire database could consume excessive computer resources depending on the complexity and design of the regex. Although in many cases system administrators can run regex-based queries internally, most search engines do not offer regex support to the public. Notable exceptions: searchcode, or previously Google Code Search, which has been shut down in 2012.
Google also offers re2 (a C++ a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python): it does not backtrack and guarantees linear runtime growth with input size.

258926 questions
79 answers

How can I validate an email address in JavaScript?

I'd like to check if the user input is an email address in JavaScript, before sending it to a server or attempting to send an email to it, to prevent the most basic mistyping. How could I achieve this?
  • 31,139
  • 18
  • 86
  • 102
34 answers

Regular expression to match a line that doesn't contain a word

I know it's possible to match a word and then reverse the matches using other tools (e.g. grep -v). However, is it possible to match lines that do not contain a specific word, e.g. hede, using a regular…
  • 1,421
  • 7
  • 19
  • 16
73 answers

How can I validate an email address using a regular expression?

Over the years I have slowly developed a regular expression that validates most email addresses correctly, assuming they don't use an IP address as the server part. I use it in several PHP programs, and it works most of the time. However, from time…
  • 12,814
  • 10
  • 39
  • 55
18 answers

What is a non-capturing group in regular expressions?

How are non-capturing groups, i.e., (?:), used in regular expressions and what are they good for?
  • 90,630
  • 105
  • 267
  • 383
27 answers

How do you use a variable in a regular expression?

I would like to create a String.replaceAll() method in JavaScript and I'm thinking that using a regex would be most terse way to do it. However, I can't figure out how to pass a variable in to a regex. I can do this already which will replace all…
JC Grubbs
  • 39,191
  • 28
  • 66
  • 75
23 answers

How do you access the matched groups in a JavaScript regular expression?

I want to match a portion of a string using a regular expression and then access that parenthesized substring: var myString = "something format_abc"; // I want "abc" var arr = /(?:^|\s)format_(.*?)(?:\s|$)/.exec(myString); console.log(arr); //…
  • 537,072
  • 198
  • 649
  • 721
4 answers

\d less efficient than [0-9]

I made a comment yesterday on an answer where someone had used [0123456789] in a regex rather than [0-9] or \d. I said it was probably more efficient to use a range or digit specifier than a character set. I decided to test that out today and found…
  • 54,145
  • 21
  • 145
  • 203
3 answers

Negative matching using grep (match lines that do not contain foo)

How do I match all lines not matching a particular pattern using grep? I tried this: grep '[^foo]'
  • 21,158
  • 26
  • 82
  • 139
10 answers

Is there a regular expression to detect a valid regular expression?

Is it possible to detect a valid regular expression with another regular expression? If so please give example code below.
  • 9,011
  • 3
  • 17
  • 7
15 answers

Check whether a string matches a regex in JS

I want to use JavaScript (I can also use jQuery) to do check whether a string matches the regex ^([a-z0-9]{5,})$, and get a true or false result. match() seems to check whether part of a string matches a regex, not the whole thing. Does it solve the…
  • 31,629
  • 29
  • 108
  • 145
45 answers

How to validate phone numbers using regex

I'm trying to put together a comprehensive regex to validate phone numbers. Ideally it would handle international formats, but it must handle US formats, including the following: 1-234-567-8901 1-234-567-8901 x1234 1-234-567-8901 ext1234 1 (234)…
Nicholas Trandem
  • 2,815
  • 5
  • 30
  • 32
64 answers

What is the best regular expression to check if a string is a valid URL?

How can I check if a given string is a valid URL address? My knowledge of regular expressions is basic and doesn't allow me to choose from the hundreds of regular expressions I've already seen on the web.
Vitor Silva
  • 17,114
  • 8
  • 33
  • 27
15 answers

Regular Expressions: Is there an AND operator?

Obviously, you can use the | (pipe?) to represent OR, but is there a way to represent AND as well? Specifically, I'd like to match paragraphs of text that contain ALL of a certain phrase, but in no particular order.
  • 35,731
  • 24
  • 60
  • 70
43 answers

Regex for password must contain at least eight characters, at least one number and both lower and uppercase letters and special characters

I want a regular expression to check that: A password contains at least eight characters, including at least one number and includes both lower and uppercase letters and special characters, for example #, ?, !. It cannot be your old password or…
Swapnil Tatkondawar
  • 8,947
  • 3
  • 13
  • 3
2 3
99 100