4089

Over the years I have slowly developed a regular expression that validates most email addresses correctly, assuming they don't use an IP address as the server part.

I use it in several PHP programs, and it works most of the time. However, from time to time I get contacted by someone that is having trouble with a site that uses it, and I end up having to make some adjustment (most recently I realized that I wasn't allowing four-character TLDs).

What is the best regular expression you have or have seen for validating emails?

I've seen several solutions that use functions that use several shorter expressions, but I'd rather have one long complex expression in a simple function instead of several short expression in a more complex function.

Braiam
  • 1
  • 11
  • 47
  • 78
acrosman
  • 12,814
  • 10
  • 39
  • 55
  • 9
    The regex that can validate that an IDNA is correctly formatted does not fit in stackexchange. (the rules on canonicalisation ate really tortuous and particularly ill-suited to regex processing) – Jasen Aug 29 '17 at 23:51
  • 12
    Why you should not do this: [Can it cause harm to validate email addresses with a regex?](https://stackoverflow.com/q/48055431/6699433) – klutt Jan 09 '18 at 14:30
  • The regexes may be **variable** as in some cases, an email con can contain a space, and in other times, it cannot contain any spaces. – Ṃųỻịgǻňạcểơửṩ Jul 23 '18 at 04:21
  • You can check Symfonys regex for loose and strict check: https://github.com/symfony/symfony/blob/5.x/src/Symfony/Component/Validator/Constraints/EmailValidator.php – Did May 16 '21 at 16:15
  • Using just regex can harm server security but if it is just as an input pattern, i suggest use this: https://stackoverflow.com/questions/5601647/html5-email-input-pattern-attribute/65442112#65442112 – MMMahdy-PAPION Jun 07 '21 at 21:42
  • use this expression for email id - `^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$` – Pritesh Jul 19 '21 at 06:08
  • Similar: *[What's the best way to validate an email address in JavaScript?](https://stackoverflow.com/questions/46155/)* – Peter Mortensen Feb 16 '22 at 23:20
  • Late to the party, but since PHP was mentioned, consider using [filter_var()](https://www.php.net/manual/en/function.filter-var.php) – Mavelo Mar 23 '22 at 03:19
  • Does this answer your question? [How can I validate an email address in JavaScript?](https://stackoverflow.com/questions/46155/how-can-i-validate-an-email-address-in-javascript) – TylerH May 04 '22 at 13:18
  • See also [ Regex validation of email addresses according to RFC5321/RFC5322 ](https://stackoverflow.com/questions/13992403/regex-validation-of-email-addresses-according-to-rfc5321-rfc5322) – Michael Freidgeim Nov 25 '22 at 05:14
  • There's an article from March 2023 available via Medium (which may have limited accessibility, therefore) [On the practicality of regex for email address processing](https://betterprogramming.pub/on-the-practicality-of-regex-for-email-address-processing-78280ab006c3) which may be of interest. – Jonathan Leffler Jul 19 '23 at 19:12

73 Answers73

3302

The fully RFC 822 compliant regex is inefficient and obscure because of its length. Fortunately, RFC 822 was superseded twice and the current specification for email addresses is RFC 5322. RFC 5322 leads to a regex that can be understood if studied for a few minutes and is efficient enough for actual use.

One RFC 5322 compliant regex can be found at the top of the page at http://emailregex.com/ but uses the IP address pattern that is floating around the internet with a bug that allows 00 for any of the unsigned byte decimal values in a dot-delimited address, which is illegal. The rest of it appears to be consistent with the RFC 5322 grammar and passes several tests using grep -Po, including cases domain names, IP addresses, bad ones, and account names with and without quotes.

Correcting the 00 bug in the IP pattern, we obtain a working and fairly fast regex. (Scrape the rendered version, not the markdown, for actual code.)

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

or:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Here is diagram of finite state machine for above regexp which is more clear than regexp itself enter image description here

The more sophisticated patterns in Perl and PCRE (regex library used e.g. in PHP) can correctly parse RFC 5322 without a hitch. Python and C# can do that too, but they use a different syntax from those first two. However, if you are forced to use one of the many less powerful pattern-matching languages, then it’s best to use a real parser.

It's also important to understand that validating it per the RFC tells you absolutely nothing about whether that address actually exists at the supplied domain, or whether the person entering the address is its true owner. People sign others up to mailing lists this way all the time. Fixing that requires a fancier kind of validation that involves sending that address a message that includes a confirmation token meant to be entered on the same web page as was the address.

Confirmation tokens are the only way to know you got the address of the person entering it. This is why most mailing lists now use that mechanism to confirm sign-ups. After all, anybody can put down president@whitehouse.gov, and that will even parse as legal, but it isn't likely to be the person at the other end.

For PHP, you should not use the pattern given in Validate an E-Mail Address with PHP, the Right Way from which I quote:

There is some danger that common usage and widespread sloppy coding will establish a de facto standard for e-mail addresses that is more restrictive than the recorded formal standard.

That is no better than all the other non-RFC patterns. It isn’t even smart enough to handle even RFC 822, let alone RFC 5322. This one, however, is.

If you want to get fancy and pedantic, implement a complete state engine. A regular expression can only act as a rudimentary filter. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective. A state engine for the purpose can both validate and even correct e-mail addresses that would otherwise be considered invalid as it disassembles the e-mail address according to each RFC. This allows for a potentially more pleasing experience, like

The specified e-mail address 'myemail@address,com' is invalid. Did you mean 'myemail@address.com'?

See also Validating Email Addresses, including the comments. Or Comparing E-mail Address Validating Regular Expressions.

Regular expression visualization

Debuggex Demo

Braiam
  • 1
  • 11
  • 47
  • 78
bortzmeyer
  • 34,164
  • 12
  • 67
  • 91
  • 256
    You said "There is no good regular expression." Is this general or specific to e-mail address validation? – Tomalak Oct 14 '08 at 14:33
  • 58
    @Tomalak: only for email addresses. As bortzmeyer said, the RFC is extremely complicated – Luk Oct 14 '08 at 16:23
  • 46
    The linux journal article you mention is factually wrong in several respects. In particular Lovell clearly hasn't read the errata to RFC3696 and repeats some of the errors in the published version of the RFC. More here: http://www.dominicsayers.com/isemail – Dominic Sayers Apr 08 '09 at 15:56
  • 7
    Note that [the current HTML5 spec](http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#e-mail-state-(type=email)) includes a regex and ABNF for email-type input validation that is deliberately more restrictive than the original RFCs. – Synchro Sep 09 '14 at 16:14
  • 18
    RFC 822, section 6.2.4. specifically and explicitly allows capital letters, but this answer does not. https://www.w3.org/Protocols/rfc822/#z26 Perhaps the author of this answer intended for their regex to be applied insensitively. If so, that should be made explicit in the body of the answer. – Jared Beck Apr 10 '19 at 22:36
  • 1
    I use ^[a-zA-Z0-9+.-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9]{2,63}$ – Sandra Aug 04 '21 at 17:22
  • Tra@XDentalLab.com; and emariemcadams@Gmail.com; seem to fail – Dan Krueger Aug 14 '21 at 04:42
  • Please re-check the ending part. I'm looking at this sub-string "\x21-\x5a\x53-\x7f" and it seems odd to me, because the first part overlaps with the second one. I checked it in the RFC and it is consistent, but it is still unclear. – VitaliG Oct 11 '21 at 10:20
  • 2
    @DanKrueger You should have the `i` regex flag set for this to work. – Makonede Oct 27 '21 at 00:05
  • This regex allows number as inicial email id and commas. – Luis Alfredo Serrano Díaz Jan 10 '22 at 18:59
  • Emails in the format `email+TEST@example.com` fail. The `\+[a-zA-Z0-9]` fails. – Charles Kornoelje Jan 10 '22 at 22:04
  • My solution following the spec [Beyond Regex](https://vquilon.github.io/beyond-regex/panels/visualizer#!embed=true&flags=ig&re=(%3F%3A%5E%7C%20)(%5B%5Cw!%23%24%25%26'*%2B%5C-%5C%2F%3D%3F%5E_%60%7B%7C%7D~%5D%2B%5C.%5B%5Cw!%23%24%25%26'*%2B%5C-%5C%2F%3D%3F%5E_%60%7B%7C%7D~%5D%2B%7C%22(%3F%3A%5B%5Cw!%23%24%25%26'*%2B%5C-%5C%2F%3D%3F%5E_%60%7B%7C%7D~()%2C%3A%3B%3C%3E%40%5B%5C%5D.%5D%7C%5C%5C%5B%22%5C%5C%5D)%2B%22)%40(%3F%3A(%3F%3A%5Ba-z0-9%5D(%3F%3A%5Ba-z0-9-%5D*%5Ba-z0-9%5D)%3F%5C.)%2B%5Ba-z0-9%5D(%3F%3A%5Ba-z0-9-%5D*%5Ba-z0-9%5D)%3F)) – Víctor Quilón Jan 18 '22 at 10:42
  • What does \x01-\x08 as part of the character class allowable in double-quotes signify? The \x08 appears to be a backspace. Is this allowed in an email address? Stumped! – Sun Bee Mar 02 '22 at 23:01
  • 2
    @CharlesKornoelje in python you can use `re.IGNORECASE` to make match regardless of case – Jonathan May 11 '22 at 20:27
  • Incidentally, these kinds of drawings are known as "railroad diagrams" or "[syntax diagrams](https://en.wikipedia.org/wiki/Syntax_diagram)." There is [a similar such drawing for JSON](https://www.json.org/json-en.html). – Tom Jul 15 '22 at 22:13
  • the given regex allows emails with whitespace, according to regex101.com – user391838 Aug 04 '22 at 09:12
  • 4
    Thanks to the post, I now have to go back to therapy for the trauma attempting to read and understand the diagrams and charts has given me. I now wake up in the middle of the night with a dark shadowy figure over my bed, and all I can hear is the screaming of the innocents harmed by regex. P̴L̸E̷A̵S̸E̵ ̴D̴E̸A̴R̵ ̵G̴O̶D̸ ̸M̸A̶K̶E̸ ̴I̵T̷ ̸S̷T̶O̷P̸ ̸P̴L̸E̵A̵S̵E̷ – unixandria Aug 08 '22 at 15:33
  • "...can be understood if studied for a few minutes.." - which tells me the IQ of the poster in the 180+ range.. – Will Croxford Aug 16 '22 at 10:12
  • If I use your regular expression in grep it gives errors, could you please post a working grep command line sample? – sw. Sep 24 '22 at 22:49
  • The same issue happens in Perl. – sw. Sep 24 '22 at 23:08
  • 1
    Your regex chokes on capital letters in an address. Just worth noting you should convert to all lowercase before doing the comparison – Stephen R Oct 12 '22 at 21:36
  • 1
    Version that accepts uppercase: `(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])` – Stephen R Oct 12 '22 at 21:53
  • Oddly email@email.io? seems to pass. I'm using DB2 REXEXP_LIKE so maybe it's flubbing up on this fairly complex regexp. – David Bradley Oct 26 '22 at 02:10
  • 1
    This does not work for email addresses with æ,ø or å in the domain name even though these are valid – Metareven Nov 28 '22 at 12:45
  • Have tried this regex with Java? Doesn't work for me, even after escaping the " and \ – Michał Jabłoński Jan 19 '23 at 17:49
  • php ready for preg_match ```";(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\sRechnungsnummer;i"``` – 8ctopus Apr 04 '23 at 12:05
  • Why post a show regex making it sound RFC5322 complaint ? If you say its not, why post it ? A simple Google search shows RFC5322 email addresses character number limits: local part `^(?=.{1,64}@)` # 64 max chars, Domain part `(?=.{1,255}$)` # 255 max chars. Email addresses should have anchors or equivalent to confine its definition. Obviously you can't sit a length assertion in the middle after `@` for the domain without referencing an end anchor. This post is useless and not even a good example. It's Click-Bait. – sln May 11 '23 at 20:43
  • 1
    http://emailregex.com/ doesn't seem to be what it used to anymore... Seems like something completely irrelevant now – brand5i2g May 16 '23 at 19:58
  • I've copied the content of emailregex.com here https://piotr.gg/regexp/email-address-regular-expression-that-99-99-works.html if anyone interested. – piotr.gradzinski Jun 01 '23 at 07:36
850

You should not use regular expressions to validate email addresses.

Instead, in C# use the MailAddress class, like this:

try {
    address = new MailAddress(address).Address;
} catch(FormatException) {
    // address is invalid
}

The MailAddress class uses a BNF parser to validate the address in full accordance with RFC822.

If you plan to use the MailAddress to validate the e-mail address, be aware that this approach accepts the display name part of the e-mail address as well, and that may not be exactly what you want to achieve. For example, it accepts these strings as valid e-mail addresses:

  • "user1@hotmail.com; user2@gmail.com"
  • "user1@hotmail.com; user2@gmail.com; user3@company.com"
  • "User Display Name user3@company.com"
  • "user4 @company.com"

In some of these cases, only the last part of the strings is parsed as the address; the rest before that is the display name. To get a plain e-mail address without any display name, you can check the normalized address against your original string.

bool isValid = false;

try
{
    MailAddress address = new MailAddress(emailAddress);
    isValid = (address.Address == emailAddress);
    // or
    // isValid = string.IsNullOrEmpty(address.DisplayName);
}
catch (FormatException)
{
    // address is invalid
}

Furthermore, an address having a dot at the end, like user@company. is accepted by MailAddress as well.

If you really want to use a regex, here it is:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>

@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>

@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • 49
    You'll find that the MailAddress class in .NET 4.0 is far better at validating email addresses than in previous versions. I made some significant improvements to it. – Jeff Tucker Dec 15 '09 at 09:56
  • 10
    I think it sort of... doesn't work... for simpler ids. a@b doesn't validate. ar@b.com matches only till ar@b , the .com is not matched. However, something like "I am me"@[10.10.10.10] does work! :) – Raze Dec 15 '09 at 11:24
  • 4
    Are you sure it's correct? http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx – Joshua Drake Dec 15 '09 at 16:09
  • To clarify, the part that I didn't write is the regex, not the `MailAddress` code. – SLaks Dec 23 '09 at 23:02
  • 2
    Would have been nice from MS to put this in a MailAdress.Check()-method so normal people (not against you SLaks!) are able to realize this! @Slaks: Did you invent a time-stopping-machine for getting this knowledge? I'll buy one! Need 128-hour-days! – Alexander Schmidt Feb 08 '12 at 18:34
  • 11
    Be warned that these RFC compliant regex validators will let through a lot of email addresses that you probably wouldn't want to accept such as "a" which is a valid email address in perl's Email::Valid (which uses that huge regex), and can be exploited for XSS https://rt.cpan.org/Public/Bug/Display.html?id=75650 – Matthew Lock Sep 28 '12 at 06:03
  • 14
    @MatthewLock: That is no worse than `fake@not-a-real-domain.name`. You **must not** rely on email validation to prevent XSS. – SLaks Sep 28 '12 at 17:19
  • 4
    Just FYI: Microsoft does provide a "recommended" Regular Expression for this task at [How to: Verify that Strings Are in Valid Email Format](https://msdn.microsoft.com/en-us/library/01escwtf.aspx). But, just after they explain how the RegEx works, they throw in "Instead of using a regular expression to validate an email address, you can use the System.Net.Mail.MailAddress class." :) – Solomon Rutzky Feb 18 '15 at 03:45
  • 2
    @MatthewLock Why wouldn't you want to accept that?! What if that's actually someone's email address? Do you also reject surnames containing "null"? – user253751 Mar 02 '16 at 01:38
  • 14
    @MatthewLock: **No.** You need to _escape_ SQL queries (or, better yet, use parameters). Sanitization is not a proper defense. – SLaks Mar 02 '16 at 14:49
  • There are 3 trailing spaces in the (Markdown) source for the regular expression, and they are also in the rendered output. Some of them *apparently* due to lines not conforming to 84 characters. Are they significant? Or do they even ruin the regular expression? (Not rhetorical questions.) – Peter Mortensen Feb 13 '22 at 15:28
  • Upvoting for the suggestion to use `MailAddress`, not for the regular expression. (Not that I think the RE is wrong, just that I would personally try really really hard to avoid using such a [admittedly clever] monstrosity in production code. It would be almost impossibly fragile, very very hard to inspect, and difficult to maintain.) – Mark Meuer Jul 08 '22 at 18:35
  • Also, it's worth noting that in .NET 6 (not sure when it was introduced) there is now a `MailAddress.TryCreate` method, which allows you to avoid having to catch an exception just to test the address. – Mark Meuer Jul 08 '22 at 18:44
  • I don't know if it's changed in .NET 6, but the old `MailAddress` parser [could definitely be tricked](https://stackoverflow.com/questions/7173401/why-does-mailaddress-think-johngmail-is-a-valid-email-address/7176931#comment105829115_7176931). – Michael Aug 17 '22 at 15:03
  • This regex is dubious. Sections are blindly cut and pasted. There are 24 capture groups of this subsection verbatim `\[([^\[\]\r\\]|\\.)*\]` where most are enclosed in an outer group `(?: ...)*` that is quantified. Clearly this regex is for show and is meaningless. – sln May 11 '23 at 20:21
610

This question is asked a lot, but I think you should step back and ask yourself why you want to validate email adresses syntactically? What is the benefit really?

  • It will not catch common typos.
  • It does not prevent people from entering invalid or made-up email addresses, or entering someone else's address for that matter.

If you want to validate that an email is correct, you have no choice than to send a confirmation email and have the user reply to that. In many cases you will have to send a confirmation mail anyway for security reasons or for ethical reasons (so you cannot e.g. sign someone up to a service against their will).

iMyke
  • 19
  • 1
  • 10
JacquesB
  • 41,662
  • 13
  • 71
  • 86
  • 125
    It might be worth checking that they entered something@something into the field in a client side validation just to catch simple mistakes - but in general you are right. – Martin Beckett Aug 25 '09 at 16:25
  • 148
    @olavk: if someone enters a typo (eg: `me@hotmail`), they're obviously not going to get your confirmation email, and then where are they? They're not on your site any more and they're wondering why they couldn't sign up. Actually no they're not - they've completely forgotten about you. However, if you could just do a basic sanity check with a regex while they're still with you, then they can catch that error straight away and you've got a happy user. – nickf Jun 02 '10 at 13:53
  • 5
    @JacquesB: You make an excellent point. Just because it passes muster per the RFC doesn’t mean it is really that user’s address. Otherwise all those `president@whitehouse.gov` addresses indicate a very netbusy commander-in-chief. :) – tchrist Nov 07 '10 at 20:09
  • 56
    It doesn't have to be black or white. If the e-mail looks wrong, let the user know that. If the user still wants to proceed, let him. Don't force the user to conform to your regex, rather, use regex as a tool to help the user know that there might be a mistake. – ninjaneer Feb 18 '14 at 02:56
  • 2
    Security. If you validate email addresses properly and only allow safe characters chances of the email address being used in some sort of malicious way reduces drastically. For example in email header exploits. – Gellweiler Nov 05 '21 at 10:49
  • Quicker feedback to the user typing in the email address letting them know this has no chance of working is worth it. Especially if they're entering many. But yes, one of my pet peeves are lazy companies that don't even bother to send a confirmation email. I have a couple of dozen people around the world that constantly type my email address for things. – David Bradley Oct 26 '22 at 02:15
  • In my case I had only to match e-mail addresses in order to transform them to valid `mailto:` links. But you've got a point there! – digijay Mar 18 '23 at 23:56
543

It all depends on how accurate you want to be. For my purposes, where I'm just trying to keep out things like bob @ aol.com (spaces in emails) or steve (no domain at all) or mary@aolcom (no period before .com), I use

/^\S+@\S+\.\S+$/

Sure, it will match things that aren't valid email addresses, but it's a matter of getting common simple errors.

There are any number of changes that can be made to that regex (and some are in the comments for this answer), but it's simple, and easy to understand, and is a fine first attempt.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • 9
    It does not match foobar@dk which is a valid and working email address (although probably most mail servers won't accept it or will add something.com.) – bortzmeyer Oct 14 '08 at 19:30
  • 2
    Yes, that's right, it's not RFC-compliant, but that's usually not an issue. – Andy Lester Feb 15 '09 at 04:50
  • It won't match anything with a three part host name, e.g. .co.uk and .com.au domains). – Richard Mar 04 '09 at 18:26
  • 3
    Yes, it will. I suggest you try it yourself. $ perl -le'print q{foo@bar.co.uk} =~ /^\S+@\S+\.\S+$/ ? q{Y} : q{N}' – Andy Lester Mar 06 '09 at 04:51
  • 7
    @Richard: `.` is included in `\S`. – David Thornley Dec 17 '09 at 18:48
  • 1
    @bortzmeyer: Yeah, well. It doesn’t match `postmaster`, either, which I am pretty sure is going to be a deliverable address. :) – tchrist Nov 07 '10 at 20:16
  • \S includes @ so it will also match a@@b.c – JJJ Oct 16 '12 at 06:18
  • 66
    JJJ: Yes, it will match a lot of crap. It will match &$*#$(@$0(%))$#.)&*)(*$, too. For me, I'm more concerned with catching the odd fumble-finger typo like `mary@aolcom` than I am complete garbage. YMMV. – Andy Lester Oct 16 '12 at 16:03
  • 14
    Just to control for `@` signs: `/^[^\s@]+@[^\s@]+\.[^\s@]{2,}$/` http://jsfiddle.net/b9chris/mXB96/ – Chris Moschini Aug 04 '14 at 21:32
  • 27
    And another common typo: two consecutive dots in domain name or a comma instead of a dot. `^[^\s@]+@([^\s@.,]+\.)+[^\s@.,]{2,}$` – Piskvor left the building Sep 24 '15 at 09:12
  • 3
    We want explanation about this :) . People come here to see Why it is the way it is. Please consider Regex explanation too! Not everyone is advanced enough to know what you wrote there without explanation. Thanks – Pratik Joshi Dec 06 '15 at 10:47
  • 4
    Spaces are actually allowed in email addresses, provided they are backslash-escaped inside a quoted-pair. `"an\ inbox"@example.com` is a valid email address, and will deliver to an inbox named `an inbox`. – awwright Sep 11 '20 at 06:20
  • 3
    @awwright Yes, spaces are allowed in email addresses. However, I suggest that it's far more likely that you'll stop someone from mistakenly entering a space in an email address than you'll find someone whose email address really is `"Bob\ Smith"@example.com`. – Andy Lester Sep 11 '20 at 13:51
  • 1
    I like this one, it covers all common cases, BUT, you need to change it to `^\S+@\S+\.\w+$` to avoid allowing e.g. `john@doe..` for example. With the `\w` instead we require an ending of `.[at least one letter or number]` – Anders Dec 22 '21 at 08:55
  • `/.+@\S+\.\S+$/` since spaces are allowed in email addresses as long as they are wrapped in quotes. https://regex101.com/r/PUMajv/2 – Caleb Taylor Dec 28 '21 at 02:02
  • Perhaps list some examples of *false positives* and some examples of *false negatives*, e.g., from the comments here (e.g. with some annotation as to why it is not bad thing, pros/cons, caveats, etc.)? Comments may be deleted at any time. – Peter Mortensen Feb 13 '22 at 14:39
373

It depends on what you mean by best: If you're talking about catching every valid email address use the following:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

(http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html) If you're looking for something simpler but that will catch most valid email addresses try something like:

"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"

EDIT: From the link:

This regular expression will only validate addresses that have had any comments stripped and replaced with whitespace (this is done by the module).

Good Person
  • 1,437
  • 2
  • 23
  • 42
  • 64
    Can you give me an example of some `email address` that wrongly passes through the second one, but is caught by the longer regex? – Lazer May 15 '10 at 18:32
  • 4
    Much though I did once love it, that’s an RFC 822 validator, not [an RFC 5322](http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses/1917982#1917982) one. – tchrist Nov 07 '10 at 20:17
  • 28
    @Lazer in..valid@example.com would be a simple example. You aren't allowed to have two consecutive unquoted dots in the local-part. – Randal Schwartz Dec 06 '11 at 18:04
  • 1
    I tried to implement this but no parser is taking it. What was a target that was able to compile it? – Mikhail Jul 25 '12 at 06:17
  • 5
    @Mikhail perl but you shouldn't actually use it. – Good Person Jan 08 '13 at 18:48
  • 2
    Also, a@a will pass on the first regex but fail in the second regex. Yes, some real emails don't have a . after @ – priomsrb Jun 05 '13 at 01:24
  • 2
    you can also quote anything you want before the @, like: "something@gmail.com"@foo.com is an address with account name something@gmail.com stored in the domain foo.com – chad Jun 21 '13 at 18:43
  • Just in case, if you want to know what that regular expression does, head here http://regex101.com/r/xU9bO0 – thefourtheye Oct 26 '13 at 03:53
  • 2
    [This](https://www.debuggex.com/i/kt6_PU2Sz2jP_K51.png) makes it quite easier to understand... Haha! – sp00m Nov 14 '13 at 15:13
  • 1
    It is better not to use regex, than using such complex, non debuggable regex. You would be better off doing string parsing and enforcing rules than such long regex. http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html – iankit Jan 26 '14 at 19:03
  • This regex still validates: !!@192.168.0.555 (which is an invalid IP) – Josue Alexander Ibarra Apr 11 '14 at 21:09
  • 2
    The 2nd regex wrongly validates email@internet.com. (email with dot in the end) – rsc May 28 '14 at 09:05
  • 3
    @RSC that is a FQDN which is fine – Good Person May 28 '14 at 23:16
  • Your "simple" expression doesn't handle single quotes in the username portion. Eg brian.o'connor@domain.com (which is valid) – NickG Jul 30 '15 at 10:22
  • Note that it's RFC 822, which is pretty outdated standard and doesn't cover a lot of new cases introduced by new versions. – RReverser Aug 21 '15 at 16:21
  • Has anyone actually tried this "RFC822 compatible" regex? It does not work for me - https://regex101.com/r/gM6lE7/3 –  Nov 14 '15 at 19:26
  • I have a compliant ajv validator... but on top of that, I don't want capital letters either... hence why I can use an even simpler version of this. I require my clients to send me lower case emails only. – Ray Foss Nov 09 '19 at 06:45
  • This expression validates folding whitespace, which is part of the MIME message and not considered part of the email address. It is not useful, unless you are parsing an address out of a MIME message. – awwright Aug 31 '20 at 21:29
334

Per the W3C HTML5 specification:

^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$

Context:

A valid e-mail address is a string that matches the ABNF production […].

Note: This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the “@” character), too vague (after the “@” character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Josh Stodola
  • 81,538
  • 47
  • 180
  • 227
  • 13
    This is interesting. It's a violation of RFC, but a willful one and it makes sesne. Real world example: gmail ignores dots in the part before @, so if your email is test@gmail.com you can send emails to test.@gmail.com or test....@gmail.com, both of those addresses are invalid according to RFC, but valid in real world. – valentinas Jan 16 '13 at 05:04
  • 1
    I think last part should be '+' instead of '*': ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)+$ – mmmmmm Jan 21 '13 at 12:12
  • 11
    @mmmmmm `john.doe@localhost` is valid. For sure, in a real world application (i.e. a community), I'd like your suggest to replace * by + – rabudde Feb 01 '13 at 10:03
  • 3
    @valentinas Actually, the RFC does *not* preclude these local parts, but they have to be quoted. `"test...."@gmail.com` is perfectly valid according to the RFC and semantically equivalent to `test....@gmail.com`. – Rinke Nov 17 '14 at 09:01
  • I get an error while trying to send email using python through my company's relay if I try to send to an address with a .@ or ..@. Actually that is also the case with a _@. I rather remove those before sending than trusting that the recipient will do it. – ndvo Feb 11 '16 at 11:31
  • hmmm..., phpstorm says: `(...+...)* might be exploited (ReDoS, Regular Expression Denial of Service)` – steven Jul 27 '17 at 19:50
  • soo..... underscores can't be in domain names? i.e. this is considered invalid `joe@foo_bar.com`? – AndyPerlitch Nov 02 '18 at 21:11
  • Perl errors with "Use of ?PATTERN? without explicit operator is deprecated" – J. McNerney Jun 24 '19 at 13:27
299

[UPDATED] I've collated everything I know about email address validation at http://isemail.info, which now not only validates, but it also diagnoses problems with email addresses. I agree with many of the comments here that validation is only part of the answer; see my essay What is a valid email address?.

is_email() remains, as far as I know, the only validator that will tell you definitively whether a given string is a valid email address or not. I've uploaded a new version at http://isemail.info/

I collated test cases from Cal Henderson, Dave Child, Phil Haack, Doug Lovell, RFC 5322 and RFC 3696. 275 test addresses in all. I ran all these tests against all the free validators I could find.

I'll try to keep this page up-to-date as people enhance their validators. Thanks to Cal, Michael, Dave, Paul and Phil for their help and cooperation in compiling these tests and constructive criticism of my own validator.

People should be aware of the errata against RFC 3696 in particular. Three of the canonical examples are in fact invalid addresses. And the maximum length of an address is 254 or 256 characters, not 320.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Dominic Sayers
  • 1,783
  • 2
  • 20
  • 26
  • [This validator](http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses/1917982#1917982) also seems correct. [...time passes...] Hm, looks like it is just RFC 5322, not 3693 or errata thereto. – tchrist Nov 07 '10 at 20:11
  • 2
    Very nice. Here we not only get a nice essay, we get a validation tester as well as a library to download. Nice answer! – bgmCoder Apr 09 '13 at 20:49
  • 2
    Your validator doesn't support punycode (RFC 3492). name@öäü.at can be a valid address. (it translates to name@xn--4ca9at.at) – Josef Mar 25 '15 at 07:28
  • Hi @Josef. You should try to validate `name@xn--4ca9at.at` since this code is about validation, not interpretation. If you'd like to add a punycode translator then I'm happy to accept a pull request at https://github.com/dominicsayers/isemail – Dominic Sayers Apr 27 '15 at 18:19
207

It’s easy in Perl 5.10 or newer:

/(?(DEFINE)
   (?<address>         (?&mailbox) | (?&group))
   (?<mailbox>         (?&name_addr) | (?&addr_spec))
   (?<name_addr>       (?&display_name)? (?&angle_addr))
   (?<angle_addr>      (?&CFWS)? < (?&addr_spec) > (?&CFWS)?)
   (?<group>           (?&display_name) : (?:(?&mailbox_list) | (?&CFWS))? ;
                                          (?&CFWS)?)
   (?<display_name>    (?&phrase))
   (?<mailbox_list>    (?&mailbox) (?: , (?&mailbox))*)

   (?<addr_spec>       (?&local_part) \@ (?&domain))
   (?<local_part>      (?&dot_atom) | (?&quoted_string))
   (?<domain>          (?&dot_atom) | (?&domain_literal))
   (?<domain_literal>  (?&CFWS)? \[ (?: (?&FWS)? (?&dcontent))* (?&FWS)?
                                 \] (?&CFWS)?)
   (?<dcontent>        (?&dtext) | (?&quoted_pair))
   (?<dtext>           (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e])

   (?<atext>           (?&ALPHA) | (?&DIGIT) | [!#\$%&'*+-/=?^_`{|}~])
   (?<atom>            (?&CFWS)? (?&atext)+ (?&CFWS)?)
   (?<dot_atom>        (?&CFWS)? (?&dot_atom_text) (?&CFWS)?)
   (?<dot_atom_text>   (?&atext)+ (?: \. (?&atext)+)*)

   (?<text>            [\x01-\x09\x0b\x0c\x0e-\x7f])
   (?<quoted_pair>     \\ (?&text))

   (?<qtext>           (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e])
   (?<qcontent>        (?&qtext) | (?&quoted_pair))
   (?<quoted_string>   (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))*
                        (?&FWS)? (?&DQUOTE) (?&CFWS)?)

   (?<word>            (?&atom) | (?&quoted_string))
   (?<phrase>          (?&word)+)

   # Folding white space
   (?<FWS>             (?: (?&WSP)* (?&CRLF))? (?&WSP)+)
   (?<ctext>           (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e])
   (?<ccontent>        (?&ctext) | (?&quoted_pair) | (?&comment))
   (?<comment>         \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) )
   (?<CFWS>            (?: (?&FWS)? (?&comment))*
                       (?: (?:(?&FWS)? (?&comment)) | (?&FWS)))

   # No whitespace control
   (?<NO_WS_CTL>       [\x01-\x08\x0b\x0c\x0e-\x1f\x7f])

   (?<ALPHA>           [A-Za-z])
   (?<DIGIT>           [0-9])
   (?<CRLF>            \x0d \x0a)
   (?<DQUOTE>          ")
   (?<WSP>             [\x20\x09])
 )

 (?&address)/x
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Abigail
  • 1
  • 1
  • 2
  • 2
  • 22
    Would love to see this in Python – tdc Dec 15 '11 at 16:36
  • 4
    I think that only a subset of the `addrspec` part is really relevant to the question. Accepting more than that and forwarding it though some other part of the system that is not ready to accept full RFC5822 addresses is like shooting is your own foot. – dolmen Dec 17 '11 at 13:53
  • 4
    Great (+1) but technically it's not a regex of course... (which would be impossible since the grammar is not regular). – Rinke Jan 03 '13 at 21:41
  • 12
    regexes stopped being regular some time ago. It is a valid Perl 'regex' though! – rjh Mar 10 '14 at 15:00
  • 4
    I set up a test for this regex on IDEone: http://ideone.com/2XFecH However, it doesn't fair "perfectly." Would anyone care to chime in? Am I missing something? – Mike Jul 30 '14 at 17:56
  • I used to use this with preg_match in PHP and it always worked flawlessly. Unfortunately it now fails (for whatever reason) for all addresses with more than 13 characters before the @ in PHP 7.3. – cgogolin Aug 05 '20 at 19:58
  • This includes FWS, CFWS, and comments, which are features of a MIME message–not part of the email address. – awwright Aug 31 '20 at 18:14
  • 2
    How is it possible? I can't find any @ in here? – Thomas Weller Sep 25 '20 at 11:44
  • Why is it easier in Perl 5.10? What features are used? How is "@" covered? - a numeric code of some kind? How can we know it is correct near "`!#\$%&'*+-/=?^_`"? – Peter Mortensen Feb 13 '22 at 15:41
  • Unfortunally this doesnt work with PHP >= 7.3 and more than 15 characters before @ because of 'catastrophic backtracking' as a regex tester says. – Merilix2 Apr 11 '22 at 16:43
  • I obtain an error in Perl 5.34.0: (? (?&ALPHA) | (?&DIGIT) | [ <-- HERE !#\$%&'*+-/ at parsing-test.pl – sw. Sep 25 '22 at 13:37
  • It seems you should replace it like this? (?&ALPHA) | (?&DIGIT) | [\!\#\$\%\&\'\*\+\-\/\=\?\^\_\`\{\|\}\~]) – sw. Sep 25 '22 at 14:19
  • @ThomasWeller: fwiw, it is on the addr_spec definition's line – Olivier Dulac Mar 03 '23 at 18:13
197

I use

^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$

Which is the one used in ASP.NET by the RegularExpressionValidator.

eric.christensen
  • 3,191
  • 4
  • 29
  • 35
Per Hornshøj-Schierbeck
  • 15,097
  • 21
  • 80
  • 101
  • 37
    Boo! My (ill-advised) address of `!@mydomain.net` is rejected. – Phrogz Jan 19 '11 at 21:35
  • 5
    According to this page http://data.iana.org/TLD/tlds-alpha-by-domain.txt there is no domains with just a single character in top level e.g. "something.c", "something.a", here is version that support at least 2 characters: "something.pl", "something.us": `^\\w+([-+.']\\w+)*@\\w+([-.]\\w+)*\\.\\w{2,}([-.]\\w+)*$` – Tomasz Szulc Nov 19 '15 at 12:53
  • 6
    @Wayne Whitty. You have hit upon the primary issue of whether to cater for the vast majority of addresses, or ALL, including ones that nobody would use, except to test email validation. – Patanjali Nov 28 '15 at 03:13
  • @TomaszSzulc extra back slash in your answer is confusing, I just corrected it and 2 chars domains names support is working, ^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w{2,}([-.]\w+)*$ – Aqib Mumtaz Nov 30 '15 at 11:16
  • 2
    We want explanation about this :) . People come here to see Why it is the way it is. Please consider Regex explanation too! Not everyone is advanced enough to know what you wrote there without explanation. Thanks – Pratik Joshi Dec 06 '15 at 10:47
  • 1
    `^\w+([-+.']|\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$` allows stuff like `shabadoo--morgen@gmail.com` – ganqqwerty Dec 07 '15 at 14:25
  • 3
    this fails on `simon-@hotmail.com` which is in fact valid (a customer of ours had a similar address)` – Simon_Weaver Feb 10 '17 at 00:43
  • @Simon_Weaver: Yes, this fails on emails with local-part that ends in `-`, `+`, `.` and `'`. – Voicu Jan 04 '18 at 00:54
  • 2
    If people choose to provide a `+` sign at the end of their local-part (which is valid for Gmail for example), why would we restrict that by using this regex? – Voicu Jan 04 '18 at 00:57
149

I don't know about best, but this one is at least correct, as long as the addresses have their comments stripped and replaced with white space.

Seriously. You should use an already-written library for validating emails. The best way is probably to just send a verification e-mail to that address.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Chris Vest
  • 8,642
  • 3
  • 35
  • 43
  • 3
    As far as I know, some libraries are wrong, too. I vaguely remember that PHP PEAR had such a bug. – bortzmeyer Oct 14 '08 at 14:34
  • That page also has a disclaimer at the bottom about a couple of things from the spec. that the regexp does not support. – Chris Vest Oct 14 '08 at 14:37
  • 7
    That’s an RFC 822 spec, not an [RFC 5322](http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses/1917982#1917982) spec. – tchrist Nov 07 '10 at 20:12
  • 14
    Ultimately, he's right in that the only way to truly *validate* an email address is to send an email to it and await a reply. – Blazemonger Oct 26 '11 at 19:43
141

Quick answer

Use the following regex for input validation:

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

Addresses matched by this regex:

  • have a local part (i.e. the part before the @-sign) that is strictly compliant with RFC 5321/5322,
  • have a domain part (i.e. the part after the @-sign) that is a host name with at least two labels, each of which is at most 63 characters long.

The second constraint is a restriction on RFC 5321/5322.

Elaborate answer

Using a regular expression that recognizes email addresses could be useful in various situations: for example to scan for email addresses in a document, to validate user input, or as an integrity constraint on a data repository.

It should however be noted that if you want to find out if the address actually refers to an existing mailbox, there's no substitute for sending a message to the address. If you only want to check if an address is grammatically correct then you could use a regular expression, but note that ""@[] is a grammatically correct email address that certainly doesn't refer to an existing mailbox.

The syntax of email addresses has been defined in various RFCs, most notably RFC 822 and RFC 5322. RFC 822 should be seen as the "original" standard and RFC 5322 as the latest standard. The syntax defined in RFC 822 is the most lenient and subsequent standards have restricted the syntax further and further, where newer systems or services should recognize obsolete syntax, but never produce it.

In this answer I’ll take “email address” to mean addr-spec as defined in the RFCs (i.e. jdoe@example.org, but not "John Doe"<jdoe@example.org>, nor some-group:jdoe@example.org,mrx@exampel.org;).

There's one problem with translating the RFC syntaxes into regexes: the syntaxes are not regular! This is because they allow for optional comments in email addresses that can be infinitely nested, while infinite nesting can't be described by a regular expression. To scan for or validate addresses containing comments you need a parser or more powerful expressions. (Note that languages like Perl have constructs to describe context free grammars in a regex-like way.) In this answer I'll disregard comments and only consider proper regular expressions.

The RFCs define syntaxes for email messages, not for email addresses as such. Addresses may appear in various header fields and this is where they are primarily defined. When they appear in header fields addresses may contain (between lexical tokens) whitespace, comments and even linebreaks. Semantically this has no significance however. By removing this whitespace, etc. from an address you get a semantically equivalent canonical representation. Thus, the canonical representation of first. last (comment) @ [3.5.7.9] is first.last@[3.5.7.9].

Different syntaxes should be used for different purposes. If you want to scan for email addresses in a (possibly very old) document it may be a good idea to use the syntax as defined in RFC 822. On the other hand, if you want to validate user input you may want to use the syntax as defined in RFC 5322, probably only accepting canonical representations. You should decide which syntax applies to your specific case.

I use POSIX "extended" regular expressions in this answer, assuming an ASCII compatible character set.

RFC 822

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I'll try to fix the expression as soon as possible.

([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))*

I believe it's fully compliant with RFC 822 including the errata. It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. Where an erratum has been published I give a separate expression for the corrected grammar rule (marked "erratum") and use the updated version as a subexpression in subsequent regular expressions.

As stated in paragraph 3.1.4. of RFC 822 optional linear white space may be inserted between lexical tokens. Where applicable I've expanded the expressions to accommodate this rule and marked the result with "opt-lwsp".

CHAR        =  <any ASCII character>
            =~ .

CTL         =  <any ASCII control character and DEL>
            =~ [\x00-\x1F\x7F]

CR          =  <ASCII CR, carriage return>
            =~ \r

LF          =  <ASCII LF, linefeed>
            =~ \n

SPACE       =  <ASCII SP, space>
            =~  

HTAB        =  <ASCII HT, horizontal-tab>
            =~ \t

<">         =  <ASCII quote mark>
            =~ "

CRLF        =  CR LF
            =~ \r\n

LWSP-char   =  SPACE / HTAB
            =~ [ \t]

linear-white-space =  1*([CRLF] LWSP-char)
                   =~ ((\r\n)?[ \t])+

specials    =  "(" / ")" / "<" / ">" / "@" /  "," / ";" / ":" / "\" / <"> /  "." / "[" / "]"
            =~ [][()<>@,;:\\".]

quoted-pair =  "\" CHAR
            =~ \\.

qtext       =  <any CHAR excepting <">, "\" & CR, and including linear-white-space>
            =~ [^"\\\r]|((\r\n)?[ \t])+

dtext       =  <any CHAR excluding "[", "]", "\" & CR, & including linear-white-space>
            =~ [^][\\\r]|((\r\n)?[ \t])+

quoted-string  =  <"> *(qtext|quoted-pair) <">
               =~ "([^"\\\r]|((\r\n)?[ \t])|\\.)*"
(erratum)      =~ "(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"

domain-literal =  "[" *(dtext|quoted-pair) "]"
               =~ \[([^][\\\r]|((\r\n)?[ \t])|\\.)*]
(erratum)      =~ \[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]

atom        =  1*<any CHAR except specials, SPACE and CTLs>
            =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+

word        =  atom / quoted-string
            =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"

domain-ref  =  atom

sub-domain  =  domain-ref / domain-literal
            =~ [^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]

local-part  =  word *("." word)
            =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*
(opt-lwsp)  =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*

domain      =  sub-domain *("." sub-domain)
            =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))*
(opt-lwsp)  =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))*

addr-spec   =  local-part "@" domain
            =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))*
(opt-lwsp)  =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*(\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*")((\r\n)?[ \t])*)*@((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*])(((\r\n)?[ \t])*\.((\r\n)?[ \t])*([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]|(\r\n)?[ \t]))*(\\\r)*]))*
(canonical) =~ ([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*")(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|"(\n|(\\\r)*([^"\\\r\n]|\\[^\r]))*(\\\r)*"))*@([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*])(\.([^][()<>@,;:\\". \x00-\x1F\x7F]+|\[(\n|(\\\r)*([^][\\\r\n]|\\[^\r]))*(\\\r)*]))*

RFC 5322

I arrived at the following regular expression. I invite everyone to try and break it. If you find any false positives or false negatives, please post them in a comment and I'll try to fix the expression as soon as possible.

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|\[[\t -Z^-~]*])

I believe it's fully compliant with RFC 5322 including the errata. It only recognizes email addresses in their canonical form. For a regex that recognizes (folding) whitespace see the derivation below.

The derivation shows how I arrived at the expression. I list all the relevant grammar rules from the RFC exactly as they appear, followed by the corresponding regex. For rules that include semantically irrelevant (folding) whitespace, I give a separate regex marked "(normalized)" that doesn't accept this whitespace.

I ignored all the "obs-" rules from the RFC. This means that the regexes only match email addresses that are strictly RFC 5322 compliant. If you have to match "old" addresses (as the looser grammar including the "obs-" rules does), you can use one of the RFC 822 regexes from the previous paragraph.

VCHAR           =   %x21-7E
                =~  [!-~]

ALPHA           =   %x41-5A / %x61-7A
                =~  [A-Za-z]

DIGIT           =   %x30-39
                =~  [0-9]

HTAB            =   %x09
                =~  \t

CR              =   %x0D
                =~  \r

LF              =   %x0A
                =~  \n

SP              =   %x20
                =~  

DQUOTE          =   %x22
                =~  "

CRLF            =   CR LF
                =~  \r\n

WSP             =   SP / HTAB
                =~  [\t ]

quoted-pair     =   "\" (VCHAR / WSP)
                =~  \\[\t -~]

FWS             =   ([*WSP CRLF] 1*WSP)
                =~  ([\t ]*\r\n)?[\t ]+

ctext           =   %d33-39 / %d42-91 / %d93-126
                =~  []!-'*-[^-~]

("comment" is left out in the regex)
ccontent        =   ctext / quoted-pair / comment
                =~  []!-'*-[^-~]|(\\[\t -~])

(not regular)
comment         =   "(" *([FWS] ccontent) [FWS] ")"

(is equivalent to FWS when leaving out comments)
CFWS            =   (1*([FWS] comment) [FWS]) / FWS
                =~  ([\t ]*\r\n)?[\t ]+

atext           =   ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"
                =~  [-!#-'*+/-9=?A-Z^-~]

dot-atom-text   =   1*atext *("." 1*atext)
                =~  [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*

dot-atom        =   [CFWS] dot-atom-text [CFWS]
                =~  (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*

qtext           =   %d33 / %d35-91 / %d93-126
                =~  []!#-[^-~]

qcontent        =   qtext / quoted-pair
                =~  []!#-[^-~]|(\\[\t -~])

(erratum)
quoted-string   =   [CFWS] DQUOTE ((1*([FWS] qcontent) [FWS]) / FWS) DQUOTE [CFWS]
                =~  (([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  "([]!#-[^-~ \t]|(\\[\t -~]))+"

dtext           =   %d33-90 / %d94-126
                =~  [!-Z^-~]

domain-literal  =   [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
                =~  (([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  \[[\t -Z^-~]*]

local-part      =   dot-atom / quoted-string
                =~  (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+"

domain          =   dot-atom / domain-literal
                =~  (([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?
(normalized)    =~  [-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|\[[\t -Z^-~]*]

addr-spec       =   local-part "@" domain
                =~  ((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?"(((([\t ]*\r\n)?[\t ]+)?([]!#-[^-~]|(\\[\t -~])))+(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?)"(([\t ]*\r\n)?[\t ]+)?)@((([\t ]*\r\n)?[\t ]+)?[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*(([\t ]*\r\n)?[\t ]+)?|(([\t ]*\r\n)?[\t ]+)?\[((([\t ]*\r\n)?[\t ]+)?[!-Z^-~])*(([\t ]*\r\n)?[\t ]+)?](([\t ]*\r\n)?[\t ]+)?)
(normalized)    =~  ([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|\[[\t -Z^-~]*])

Note that some sources (notably W3C) claim that RFC 5322 is too strict on the local part (i.e. the part before the @-sign). This is because "..", "a..b" and "a." are not valid dot-atoms, while they may be used as mailbox names. The RFC, however, does allow for local parts like these, except that they have to be quoted. So instead of a..b@example.net you should write "a..b"@example.net, which is semantically equivalent.

Further restrictions

SMTP (as defined in RFC 5321) further restricts the set of valid email addresses (or actually: mailbox names). It seems reasonable to impose this stricter grammar, so that the matched email address can actually be used to send an email.

RFC 5321 basically leaves alone the "local" part (i.e. the part before the @-sign), but is stricter on the domain part (i.e. the part after the @-sign). It allows only host names in place of dot-atoms and address literals in place of domain literals.

The grammar presented in RFC 5321 is too lenient when it comes to both host names and IP addresses. I took the liberty of "correcting" the rules in question, using this draft and RFC 1034 as guidelines. Here's the resulting regex.

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

Note that depending on the use case you may not want to allow for a "General-address-literal" in your regex. Also note that I used a negative lookahead (?!IPv6:) in the final regex to prevent the "General-address-literal" part to match malformed IPv6 addresses. Some regex processors don't support negative lookahead. Remove the substring |(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+ from the regex if you want to take the whole "General-address-literal" part out.

Here's the derivation:

Let-dig         =   ALPHA / DIGIT
                =~  [0-9A-Za-z]

Ldh-str         =   *( ALPHA / DIGIT / "-" ) Let-dig
                =~  [0-9A-Za-z-]*[0-9A-Za-z]

(regex is updated to make sure sub-domains are max. 63 characters long - RFC 1034 section 3.5)
sub-domain      =   Let-dig [Ldh-str]
                =~  [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?

Domain          =   sub-domain *("." sub-domain)
                =~  [0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*

Snum            =   1*3DIGIT
                =~  [0-9]{1,3}

(suggested replacement for "Snum")
ip4-octet       =   DIGIT / %x31-39 DIGIT / "1" 2DIGIT / "2" %x30-34 DIGIT / "25" %x30-35
                =~  25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]

IPv4-address-literal    =   Snum 3("."  Snum)
                        =~  [0-9]{1,3}(\.[0-9]{1,3}){3}

(suggested replacement for "IPv4-address-literal")
ip4-address     =   ip4-octet 3("." ip4-octet)
                =~  (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}

(suggested replacement for "IPv6-hex")
ip6-h16         =   "0" / ( (%x49-57 / %x65-70 /%x97-102) 0*3(%x48-57 / %x65-70 /%x97-102) )
                =~  0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}

(not from RFC)
ls32            =   ip6-h16 ":" ip6-h16 / ip4-address
                =~  (0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}

(suggested replacement of "IPv6-addr")
ip6-address     =                                      6(ip6-h16 ":") ls32
                    /                             "::" 5(ip6-h16 ":") ls32
                    / [                 ip6-h16 ] "::" 4(ip6-h16 ":") ls32
                    / [ *1(ip6-h16 ":") ip6-h16 ] "::" 3(ip6-h16 ":") ls32
                    / [ *2(ip6-h16 ":") ip6-h16 ] "::" 2(ip6-h16 ":") ls32
                    / [ *3(ip6-h16 ":") ip6-h16 ] "::"   ip6-h16 ":"  ls32
                    / [ *4(ip6-h16 ":") ip6-h16 ] "::"                ls32
                    / [ *5(ip6-h16 ":") ip6-h16 ] "::"   ip6-h16
                    / [ *6(ip6-h16 ":") ip6-h16 ] "::"
                =~  (((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::

IPv6-address-literal    =   "IPv6:" ip6-address
                        =~  IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)

Standardized-tag        =   Ldh-str
                        =~  [0-9A-Za-z-]*[0-9A-Za-z]

dcontent        =   %d33-90 / %d94-126
                =~  [!-Z^-~]

General-address-literal =   Standardized-tag ":" 1*dcontent
                        =~  [0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+

address-literal =   "[" ( IPv4-address-literal / IPv6-address-literal / General-address-literal ) "]"
                =~  \[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)]

Mailbox         =   Local-part "@" ( Domain / address-literal )
                =~  ([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)*|\[((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}|IPv6:((((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){6}|::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){5}|[0-9A-Fa-f]{0,4}::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){4}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):)?(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){3}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,2}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){2}|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,3}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,4}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,5}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}))?::)|(?!IPv6:)[0-9A-Za-z-]*[0-9A-Za-z]:[!-Z^-~]+)])

User input validation

A common use case is user input validation, for example on an html form. In that case it's usually reasonable to preclude address-literals and to require at least two labels in the hostname. Taking the improved RFC 5321 regex from the previous section as a basis, the resulting expression would be:

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+

I do not recommend restricting the local part further, e.g. by precluding quoted strings, since we don't know what kind of mailbox names some hosts allow (like "a..b"@example.net or even "a b"@example.net).

I also do not recommend explicitly validating against a list of literal top-level domains or even imposing length-constraints (remember how ".museum" invalidated [a-z]{2,4}), but if you must:

([-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@([0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?\.)*(net|org|com|info|etc...)

Make sure to keep your regex up-to-date if you decide to go down the path of explicit top-level domain validation.

Further considerations

When only accepting host names in the domain part (after the @-sign), the regexes above accept only labels with at most 63 characters, as they should. However, they don't enforce the fact that the entire host name must be at most 253 characters long (including the dots). Although this constraint is strictly speaking still regular, it's not feasible to make a regex that incorporates this rule.

Another consideration, especially when using the regexes for input validation, is feedback to the user. If a user enters an incorrect address, it would be nice to give a little more feedback than a simple "syntactically incorrect address". With "vanilla" regexes this is not possible.

These two considerations could be addressed by parsing the address. The extra length constraint on host names could in some cases also be addressed by using an extra regex that checks it, and matching the address against both expressions.

None of the regexes in this answer are optimized for performance. If performance is an issue, you should see if (and how) the regex of your choice can be optimized.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Rinke
  • 6,095
  • 4
  • 38
  • 55
  • 3
    RFC [6532](https://tools.ietf.org/html/rfc6532) updates [5322](https://tools.ietf.org/html/rfc5322) to allow and include full, clean UTF-8. Additional details [here](http://stackoverflow.com/a/31066998/2350426). –  Jun 26 '15 at 17:30
  • According to wikipedia seems that the local part, when dotted, has a limitation of 64 chars per part, and also the RFC 5322 refers to the dotted local part to be interpretted with the restrictions of the domains. For example `arbitrary-long-email-address-should-be-invalid-arbitrary-long-email-address-should-be-invalid.and-the-second-group-also-should-not-be-so-long-and-the-second-group-also-should-not-be-so-long@example.com` should not validate. I suggest changing the "+" signs in the first group (name before the optional dot) and in the second group (name after the following dots) to `{1,64}` – Xavi Montero May 22 '17 at 00:35
  • As the comments are limited in size, here is the resulting regex I plan to use, which is the one at the beginning of this answer, plus limitting the size in the local part, plus adding a back-slash prior to the "/" symbol as required by PHP and also in regex101.com: In PHP I use: `$emailRegex = '/^([-!#-\'*+\/-9=?A-Z^-~]{1,64}(\.[-!#-\'*+\/-9=?A-Z^-~]{1,64})*|"([]!#-[^-~ \t]|(\\[\t -~]))+")@[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([0-9A-Za-z-]{0,61}[0-9A-Za-z])?)+$/';` – Xavi Montero May 22 '17 at 00:39
  • CAUTION: For some reason, StackOverflow adds hidden characters when copying from the rendered markdown. Copy it into the regex101.com and you'll see black dots there. You have to remove them and correct the string... Maybe if integrated in the answer, there they are correctly copiable. Sorry for the inconvenience. I don't want to add a new answer as this one is the proper one. Also I don't want to directly edit unless the community thinks this should be integrated into it. – Xavi Montero May 22 '17 at 00:48
  • @XaviMontero Thaks for contributing Xavi! Do you have a reference to the RFC stating the 64 character limit on local part labels? If so, I would gladly adjust the answer. – Rinke May 22 '17 at 11:21
  • @Rinke, your longest regex accepts a@[a-a:::::aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa]. Could you (or anyone else) explain why it's valid? I also suggest these small improvements ^ and $ would be helpful, as well as (?=.{0,64}@.{0,255}$) just after the ^. – ByteEater Jul 30 '21 at 01:30
  • One more thing: the last ] isn't escaped, making it illegal in ECMAScript regexes with the u flag (and that's what e.g. the pattern attribute in HTML sets without the possibility to unset). – ByteEater Jul 30 '21 at 01:54
  • Actually, this should be more performant than the above check without increasing length: (?=.{1,64}@.{1,255}$). – ByteEater Jul 30 '21 at 20:56
118

The email addresses I want to validate are going to be used by an ASP.NET web application using the System.Net.Mail namespace to send emails to a list of people.

So, rather than using some very complex regular expression, I just try to create a MailAddress instance from the address. The MailAddress constructor will throw an exception if the address is not formed properly. This way, I know I can at least get the email out of the door. Of course this is server-side validation, but at a minimum you need that anyway.

protected void emailValidator_ServerValidate(object source, ServerValidateEventArgs args)
{
    try
    {
        var a = new MailAddress(txtEmail.Text);
    }
    catch (Exception ex)
    {
        args.IsValid = false;
        emailValidator.ErrorMessage = "email: " + ex.Message;
    }
}
Michael
  • 8,362
  • 6
  • 61
  • 88
davcar
  • 471
  • 2
  • 5
  • 13
  • 4
    A good point. Even if this server validation rejects some valid address then it is not a problem since you will not be able to send to this address using this particular server technology anyway. Or you can try doing the same things using any third party emailing library you use instead of the default tools. – User Jun 16 '09 at 10:59
  • 1
    I really like how this leverages .Net framework code - no sense in reinventing the wheel. This is excellent. Simple, clean, and assures you can actually send the email. Great work. – Cory House Aug 15 '10 at 19:43
  • ... yes and for the those interested in how it validates have a look at the code in Reflector - there's quite a bit of it - and it ain't a regular expression! – Tom Carter Sep 17 '10 at 08:07
  • 2
    Just a note: the MailAddress class doesn't match RFC5322, if you just want to use it for validation (and not sending as well, in which case it's a moot point as mentioned above). See: http://stackoverflow.com/questions/6023589/how-to-make-regex-work-on-corner-case-email-address/6023604#6023604 – porges May 31 '11 at 05:06
  • Just a minor issue: if you want to make your server side validator code more reusable (either in this case or generally), I suggest to use `args.Value` instead of referencing the field like `txtEmail.Text` hard-coded. The latter one will bound your validator to the single control instance, that may be OK, as long you have a single e-mail field, but not recommended otherwise. – pholpar Aug 21 '19 at 11:23
  • Good answer, but basically a duplicate of https://stackoverflow.com/a/1903368/9117. Also, it's worth noting that in .NET 6 (not sure when it was introduced) there is now a `MailAddress.TryCreate` method, which allows you to avoid having to catch an exception just to test the address. – Mark Meuer Jul 08 '22 at 18:44
78

There are plenty examples of this out on the Internet (and I think even one that fully validates the RFC - but it's tens/hundreds of lines long if memory serves).

People tend to get carried away validating this sort of thing. Why not just check it has an @ and at least one . and meets some simple minimum length? It's trivial to enter a fake email and still match any valid regex anyway. I would guess that false positives are better than false negatives.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Draemon
  • 33,955
  • 16
  • 77
  • 104
  • 1
    Yes, but **which** RFC? :) This [RFC‐5322–validator ](http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses/1917982#1917982) is only around forty lines long. – tchrist Nov 07 '10 at 20:20
  • 17
    A . is not required. A TLD can have email addresses, or there could be an IPv6 address – Sijmen Mulder Feb 15 '11 at 12:58
  • 3
    RFCs are not the end of the story: ICANN does not allow 'dotless' domains any more: https://www.icann.org/news/announcement-2013-08-30-en – Synchro Sep 09 '14 at 16:28
64

This regex is from Perl's Email::Valid library. I believe it to be the most accurate, and it matches all of RFC 822. And, it is based on the regular expression in the O'Reilly book:

Regular expression built using Jeffrey Friedl's example in Mastering Regular Expressions (http://www.ora.com/catalog/regexp/).

$RFC822PAT = <<'EOF';
[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\
xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xf
f\n\015()]*)*\)[\040\t]*)*(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\x
ff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015
"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\
xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80
-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*
)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\
\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\
x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x8
0-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n
\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x
80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^
\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040
\t]*)*)*@[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([
^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\
\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\
x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-
\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()
]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\
x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\04
0\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\
n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\
015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?!
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\
]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\
x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\01
5()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*|(?:[^(\040)<>@,;:".
\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]
)|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^
()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]*(?:(?:\([^\\\x80-\xff\n\0
15()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][
^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)|"[^\\\x80-\xff\
n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^()<>@,;:".\\\[\]\
x80-\xff\000-\010\012-\037]*)*<[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?
:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-
\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:@[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015
()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()
]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\0
40)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\
[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\
xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*
)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80
-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x
80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t
]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\
\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])
*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x
80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80
-\xff\n\015()]*)*\)[\040\t]*)*)*(?:,[\040\t]*(?:\([^\\\x80-\xff\n\015(
)]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\
\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*@[\040\t
]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\0
15()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015
()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(
\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|
\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80
-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()
]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x
80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^
\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040
\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".
\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff
])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\
\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x
80-\xff\n\015()]*)*\)[\040\t]*)*)*)*:[\040\t]*(?:\([^\\\x80-\xff\n\015
()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\
\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)?(?:[^
(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-
\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\
n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|
\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff
\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\x
ff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(
?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\
000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\
xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\x
ff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)
*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*@[\040\t]*(?:\([^\\\x80-\x
ff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-
\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)
*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\
]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]
)[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-
\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\x
ff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(
?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80
-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<
>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x8
0-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:
\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]
*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)
*\)[\040\t]*)*)*>)
EOF
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
49

As you're writing in PHP I'd advice you to use the PHP built-in validation for emails.

filter_var($value, FILTER_VALIDATE_EMAIL)

If you're running a PHP version lower than 5.3.6, please be aware of this issue: Bug #53091: Crashes when I try to filter a text of > 2264 characters

If you want more information how this built-in validation works, see here: Does PHP's filter_var FILTER_VALIDATE_EMAIL actually work?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
SimonSimCity
  • 6,415
  • 3
  • 39
  • 52
  • gets a vote up, exactly what I was going to say. Doesn't handle IDN's but converting to puny code beforehand solves this. PHP>=5.3 has idn_to_ascii() for this. One of the best and easiest ways for validating an email. – Taylor Jan 25 '12 at 23:00
46

Cal Henderson (Flickr) wrote an article called Parsing Email Addresses in PHP and shows how to do proper RFC (2)822-compliant email address parsing.

You can also get the source code in PHP, Python, and Ruby which is Creative Commons licensed.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
adnam
  • 178
  • 2
  • 4
44

I never bother creating with my own regular expression, because chances are that someone else has already come up with a better version. I always use regexlib to find one to my liking.

Kon
  • 27,113
  • 11
  • 60
  • 86
  • Nice site, however it's a bit weird that their built-in email spam protection partially hides some patterns. Especially those related to email matching. :D – Tim Wißmann Jul 09 '21 at 00:07
  • 1
    That link is a search query, e.g. ***"Search Results: 4128 regular expressions found."***. Which one in particular? How do you choose? – Peter Mortensen Feb 13 '22 at 14:32
43

One simple regular expression which would at least not reject any valid email address would be checking for something, followed by an @ sign and then something followed by a period and at least 2 somethings. It won't reject anything, but after reviewing the spec I can't find any email that would be valid and rejected.

email =~ /.+@[^@]+\.[^@]{2,}$/

spig
  • 1,667
  • 6
  • 22
  • 29
  • 5
    This is what I was looking for. Not very restrictive, but makes sure there is only 1 @ (as we're parsing a list and want to make sure there are no missing commas). FYI, you can have an @ on the left if it's in quotes: [Valid_email_addresses](http://en.wikipedia.org/wiki/Email_address#Valid_email_addresses), but it's pretty fringe. – Josh Nov 11 '11 at 06:16
  • 3
    After using it, realized it didn't work exactly. `/^[^@]+@[^@]+\.[^@]{2}[^@]*$/` actually checks for 1 @ sign. Your regex will let multiple through because of the .* at the end. – Josh Nov 11 '11 at 06:31
  • 2
    Right. I'm not trying to reject all invalid, just keep from rejecting a valid email address. – spig Nov 14 '11 at 17:48
  • 2
    It would be far better to use this: `/^[^@]+@[^@]+\.[^@]{2,4}$/` making sure that it ends with 2 to 4 non @ characters. As @Josh pointed out it now allows an extra @ in the end. But you can also change that as well to: `/^[^@]+@[^@]+\.[^a-z-A-Z]{2,4}$/` since all top level domains are a-Z characters. you can replace the `4` with `5` or more allowing top level domain names to be longer in the future as well. – FLY Jan 14 '13 at 10:51
  • @FLY, **ka@foo.** returns correct. Is it supposed to, by the standards? – SexyBeast Nov 22 '15 at 01:31
  • @Cupidvogel `\.[a-z-A-Z]{2,4}$` should make sure it ends with a . followed by 2, 3 or 4 a-Z characters. But this is also a simple check. This would also allow `ka@(*#&foo.bar` Note the ^ is missing since the ^ add a check in a group to not match. Which is wrong in my previous comment. – FLY Nov 23 '15 at 11:19
  • 1
    What if non ASCII char would used where dozens of servers support only, though ... At least, `A-Za-z0-9` then ... – Artfaith Sep 01 '20 at 05:02
39

There is not one which is really usable. I discuss some issues in my answer to Is there a PHP library for email address validation?, it is discussed also in Is regular expression recognition of an email address hard?.

In short, don't expect a single, usable regex to do a proper job. And the best regex will validate the syntax, not the validity of an e-mail (jhohn@example.com is correct, but it will probably bounce...).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
PhiLho
  • 40,535
  • 6
  • 96
  • 134
  • Correct me if I’m wrong, but I believe that PHP uses PCRE patterns. If so, you should be able to craft something similar to [Abigail’s RFC 5322 pattern](http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses/1917982#1917982). – tchrist Nov 07 '10 at 20:24
  • @tchrist: not sure if PCRE has caught up to this syntax (which I discover). If so, not sure if PHP's PCRE has caught up to this version of PCRE... Well, if I understand correctly this syntax, you can as well use a PEG parser, much clearer and complete than a regex anyway. – PhiLho Nov 10 '10 at 14:51
  • PCRE *has* caught up to it, but perhaps PHP has not caught up with PCRE. ☹ – tchrist Nov 10 '10 at 15:09
30

You could use the one employed by the jQuery Validation plugin:

/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i
chaos
  • 122,029
  • 33
  • 303
  • 309
  • this seems to be doing a good job. It allowed: `a-b'c_d.e@f-g.h` but was able to catch the inappropriate variations, such as `a-b'c_d.@f-g.h` and `a-b'c_d.e@f-.h` – dsdsdsdsd Apr 16 '14 at 11:52
  • 2
    Do you have some reference for it? – norok2 Sep 01 '21 at 07:18
29

Since May 2010, non-Latin (Chinese, Arabic, Greek, Hebrew, Cyrillic and so on) domain names exist on the Internet. Everyone has to change the email regex used, because those characters are surely not to be covered by [a-z]/i nor \w. They will all fail.

After all, the best way to validate the email address is still to actually send an email to the address in question to validate the address. If the email address is part of user authentication (register/login/etc), then you can perfectly combine it with the user activation system. I.e. send an email with a link with an unique activation key to the specified email address and only allow login when the user has activated the newly created account using the link in the email.

If the purpose of the regex is just to quickly inform the user in the UI that the specified email address doesn't look like in the right format, best is still to check if it matches basically the following regex:

^([^.@]+)(\.[^.@]+)*@([^.@]+\.)+([^.@]+)$

Simple as that. Why on earth would you care about the characters used in the name and domain? It's the client's responsibility to enter a valid email address, not the server's. Even when the client enters a syntactically valid email address like aa@bb.cc, this does not guarantee that it's a legit email address. No one regex can cover that.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • 5
    I agree the sending an authentication message is usually the best way for this kind of stuff, syntactically correct and valid are not the same. I get frustrated when I get made to type my email address twice for "Confirmation" as if I can't look at what I typed. I only copy the first one to the second anyway, it seems to be becoming used more and more. – PeteT Feb 02 '10 at 15:05
  • 2
    agree! but this regex i don't think is valid because it allow `spaces` after the `@.` eg. `test@test.ca com net` is consider a valid email by using the above regex where as it should be returning invalid. – CB4 Nov 08 '17 at 17:54
27

For the most comprehensive evaluation of the best regular expression for validating an email address please see this link; "Comparing E-mail Address Validating Regular Expressions"

Here is the current top expression for reference purposes:

/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+@((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i
Eric Schoonover
  • 47,184
  • 49
  • 157
  • 202
  • 1
    spoon16: That link isn’t really correct. Its statement that there can be no perfect pattern for validating email addresses is patently fault. You **can**, but you have to make sure that you follow the RFC right down to the letter. And you have to pick the right RFC, too. – tchrist Nov 07 '10 at 20:27
  • 1
    The "best" right now does not work with java regex - even after properly escaping and converting the string. – Eric Chen Apr 17 '12 at 20:57
24

The HTML5 specification suggests a simple regex for validating email addresses:

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

This intentionally doesn't comply with RFC 5322.

Note: This requirement is a wilful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the @ character), too vague (after the @ character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The total length could also be limited to 254 characters, per RFC 3696 errata 1690.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Luna
  • 1,447
  • 1
  • 18
  • 32
  • 1
    Best answer! Here's a link to the w3 recommendation: https://www.w3.org/TR/html5/forms.html#valid-e-mail-address This regex is adopted by many browsers. – Ryan Taylor Nov 06 '17 at 22:13
  • 6
    This is SO not the best answer! This pattern matches this wholly invalid address: `invalid@emailaddress`. I would urge caution and much testing before you use it! – Sheridan Mar 21 '18 at 11:47
  • 1
    @Sheridan, if you think there is an issue with the HTML5 spec you can raise an issue here: https://github.com/w3c/html/issues – Luna Mar 21 '18 at 12:56
  • 1
    This doesn't add much over https://stackoverflow.com/a/8829363 and would IMHO be better as an edit of or comment on that. –  Apr 29 '18 at 21:50
  • 2
    example@localhost is valid, but for a real world application you may want to enforce a domain extension, all you need to do is change the final * to a + to achieve this (changing that part of the pattern from 0+ to 1+) – Mitch Satchwell May 16 '18 at 09:05
  • 4
    `invalid@emailaddress` is a valid email address... it just doesn't exist. (And actually, depending on your network configuration, it might even be deliverable inside your local network.) And the only way to tell if an address exists or not is to send it an email. – awwright Aug 31 '20 at 18:18
  • 1
    In case anybody wants to use the above in Perl, despite the HTML5 spec saying it is compatible, it actually needs the `$` to be escaped (make it `\$`), as `$%` is a special variable in perl, so addresses containing `$` or `%` will be rejected. – Ecuador Feb 22 '22 at 22:54
  • How can I make this to work in C#? It works fine in Javascript – Morgs May 18 '22 at 21:56
16

For a vivid demonstration, the following monster is pretty good, but it still does not correctly recognize all syntactically valid email addresses: it recognizes nested comments up to four levels deep.

This is a job for a parser, but even if an address is syntactically valid, it still may not be deliverable. Sometimes you have to resort to the hillbilly method of "Hey, y'all, watch ee-us!"

// derivative of work with the following copyright and license:
// Copyright (c) 2004 Casey West.  All rights reserved.
// This module is free software; you can redistribute it and/or
// modify it under the same terms as Perl itself.

// see http://search.cpan.org/~cwest/Email-Address-1.80/

private static string gibberish = @"
(?-xism:(?:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:\
s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^
\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))
|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+
|\s+)*[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+(?-xism:(?-xism:\
s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^
\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))
|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+
|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(
?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?
:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x
0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*<DQ>(?-xism:(?-xism:[
^\\<DQ>])|(?-xism:\\(?-xism:[^\x0A\x0D])))+<DQ>(?-xism:(?-xi
sm:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xis
m:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\
]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\
s*)+|\s+)*))+)?(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?
-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:
\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[
^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*<(?-xism:(?-xi
sm:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^(
)\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(
?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))
|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<
>\[\]:;@\,.<DQ>\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]
+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))
|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:
(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s
*\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?
:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x
0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xi
sm:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*
<DQ>(?-xism:(?-xism:[^\\<DQ>])|(?-xism:\\(?-xism:[^\x0A\x0D]
)))+<DQ>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\
]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-x
ism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+
)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism:(
?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?
-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^
()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s
*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+(
?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+)*)(?-xism:(?-xism:
\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[
^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)
)|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)
+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:
(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((
?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\
x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?-x
ism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-xi
sm:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:
\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(
?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+
)*\s*\)\s*)+|\s+)*)))>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-
xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\
s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^
\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))|(?-xism:(?-x
ism:(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^
()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*
(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D])
)|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()
<>\[\]:;@\,.<DQ>\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s
]+)*)(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+)
)|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism
:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\
s*\)\s*))+)*\s*\)\s*)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((
?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\
x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-x
ism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)
*<DQ>(?-xism:(?-xism:[^\\<DQ>])|(?-xism:\\(?-xism:[^\x0A\x0D
])))+<DQ>(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\
\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-
xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)
+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*))\@(?-xism:(?-xism:(?-xism:
(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(
?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[
^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\
s*\)\s*)+|\s+)*(?-xism:[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+
(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\,.<DQ>\s]+)*)(?-xism:(?-xism
:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:
[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+
))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*
)+|\s+)*)|(?-xism:(?-xism:(?-xism:\s*\((?:\s*(?-xism:(?-xism
:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\(
(?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A
\x0D]))|)+)*\s*\)\s*))+)*\s*\)\s*)+|\s+)*\[(?:\s*(?-xism:(?-
xism:[^\[\]\\])|(?-xism:\\(?-xism:[^\x0A\x0D])))+)*\s*\](?-x
ism:(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism
:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:\s*(?-xism:(?-xism:
(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|)+)*\s*\)\s*))
+)*\s*\)\s*)+|\s+)*))))(?-xism:\s*\((?:\s*(?-xism:(?-xism:(?
>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0D]))|(?-xism:\s*\((?:
\s*(?-xism:(?-xism:(?>[^()\\]+))|(?-xism:\\(?-xism:[^\x0A\x0
D]))|)+)*\s*\)\s*))+)*\s*\)\s*)*)"
  .Replace("<DQ>", "\"")
  .Replace("\t", "")
  .Replace(" ", "")
  .Replace("\r", "")
  .Replace("\n", "");

private static Regex mailbox =
  new Regex(gibberish, RegexOptions.ExplicitCapture);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
12

According to the official standard, RFC 2822, a valid email regex is:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

If you want to use it in Java, it's really very easy:

import java.util.regex.*;

class regexSample 
{
   public static void main(String args[]) 
   {
      //Input the string for validation
      String email = "xyz@hotmail.com";

      //Set the email pattern string
      Pattern p = Pattern.compile(" (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"
              +"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")"
                     + "@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\]");

      //Match the given string with the pattern
      Matcher m = p.matcher(email);

      //Check whether match is found 
      boolean matchFound = m.matches();

      if (matchFound)
        System.out.println("Valid Email Id.");
      else
        System.out.println("Invalid Email Id.");
   }
}
Community
  • 1
  • 1
AZ_
  • 21,688
  • 25
  • 143
  • 191
  • 2
    Your regex does not include first uppercase letter for example **Leonardo.davinci@gmail.com** which could be annoying for some users. Use this one instead: ``(?:[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[A-Za-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]) `` – Kebab Krabby Jul 17 '19 at 15:07
  • @KebabKrabby Thanks, please edit the answer, I'll accept the change. – AZ_ Jul 31 '19 at 07:55
  • If i add that change to your answer it wont be RFC 2822 anymore so i dont know if thats correct. – Kebab Krabby Jul 31 '19 at 22:06
  • 2
    @KebabKrabby: I guess we would need to apply the pattern with case insensitivity somewhere in the matching options, not change the Regex itself. – Thomas Weller Sep 25 '20 at 11:46
11

RFC 5322 standard:

Allows dot-atom local-part, quoted-string local-part, obsolete (mixed dot-atom and quoted-string) local-part, domain name domain, (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain, and (nested) CFWS.

'/^(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){255,})(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){65,}@)((?>(?>(?>((?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?!(?1)[a-z0-9-]{64,})(?1)(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>(?1)\.(?!(?1)[a-z0-9-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f0-9]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?9)){3}))\])(?1)$/isD'

RFC 5321 standard:

Allows dot-atom local-part, quoted-string local-part, domain name domain, and (IPv4, IPv6, and IPv4-mapped IPv6 address) domain literal domain.

'/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"?@)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?!.*[^.]{64,})(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD'

Basic:

Allows dot-atom local-part and domain name domain (requiring at least two domain name labels with the TLD limited to 2-6 alphabetic characters).

"/^(?!.{255,})(?!.{65,}@)([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*@(?!.*[^.]{64,})(?>[a-z0-9](?>[a-z0-9-]*[a-z0-9])?\.){1,126}[a-z]{2,6}$/iD"
Michael
  • 11,912
  • 6
  • 49
  • 64
  • 1
    What the devil language is that in?? I see a `/D` flag, and you’ve quoted it with single quotes yet also used slashes to delimit the pattern? It’s not Perl, and it can’t be PCRE. Is it therefore PHP? I believe those are the only three that allow recursion like `(?1)`. – tchrist Nov 07 '10 at 20:32
  • 1
    It's in PHP, which uses PCRE. The slashes are used only to delimit special characters like parentheses, square brackets, and of course slashes and single quotes. The /D flag, if you didn't know, is to prevent a newline being added to the end of the string, which would be allowed otherwise. – Michael Feb 19 '11 at 18:24
11

Here's the PHP code I use. I've chosen this solution in the spirit of "false positives are better than false negatives" as declared by another commenter here and with regards to keeping your response time up and server load down ... there's really no need to waste server resources with a regular expression when this will weed out most simple user errors. You can always follow this up by sending a test email if you want.

function validateEmail($email) {
  return (bool) stripos($email,'@');
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Mac
  • 3,352
  • 1
  • 17
  • 4
  • 2
    a) The "waste server resources" is infinitesimal, but if you are so inclined, you could do it client side with JS b) What is you need to send a registration mail and the user enters me@forgotthedotcom ? Your "solution" fails and you lose a user. – johnjohn Apr 03 '12 at 09:40
  • 2
    a) Relying on a JS validation that would fail when JavaScript is disabled doesn't sound like the best idea either (just btw) – auco Dec 06 '13 at 15:39
  • What do you mean by false positives? Rejecting email addresses that you shouldn't have rejected? Or accepting email addresses that you shouldn't have accepted? – Peter Mortensen Feb 13 '22 at 16:01
8

If you are fine with accepting empty values (which is not an invalid email) and are running PHP 5.2+, I would suggest:

static public function checkEmail($email, $ignore_empty = false) {
    if($ignore_empty && (is_null($email) || $email == ''))
        return true;
    return filter_var($email, FILTER_VALIDATE_EMAIL);
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Prasad
  • 1,822
  • 3
  • 23
  • 40
  • 1
    What would be ***an example*** of an address with empty values? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/10362389/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Feb 13 '22 at 16:19
8

I've been using this touched up version of the OP's regex for a while and it hasn't left me with too many surprises. I've never encountered an apostrophe in an email yet so it doesn't validate that. It does validate Jean+François@anydomain.museum and 试@例子.测试.مثال.آزمایشی, but not weird abuse of those non alphanumeric characters .+@you.com.

(?!^[.+&'_-]*@.*$)(^[_\w\d+&'-]+(\.[_\w\d+&'-]*)*@[\w\d-]+(\.[\w\d-]+)*\.(([\d]{1,3})|([\w]{2,}))$)

It does support IP addresses you@192.168.1.1, but I haven't refined it enough to deal with bogus IP address ranges such as 999.999.999.1.

It also supports all the TLDs over three characters which stops asdf@asdf.asdf which I think the original let through. I've been beat, there are too many TLDs now over 3 characters.

I know the OP has abandoned his regex, but this flavour lives on.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
TombMedia
  • 1,962
  • 2
  • 22
  • 27
  • 1
    For all: The regular expression in the question was removed in revision 10, in 2015 (about 7 years later). – Peter Mortensen Feb 13 '22 at 16:25
  • 1
    Strikeout shouldn't be necessary. That is what the revision history is for. If something is no longer valid, it should be removed. The answer should be as if it was written today. – Peter Mortensen Feb 13 '22 at 16:27
8

Strange that you "cannot" allow 4 characters TLDs. You are banning people from .info and .name, and the length limitation stop .travel and .museum, but yes, they are less common than 2 characters TLDs and 3 characters TLDs.

You should allow uppercase alphabets too. Email systems will normalize the local part and domain part.

For your regex of domain part, domain name cannot starts with '-' and cannot ends with '-'. Dash can only stays in between.

If you used the PEAR library, check out their mail function (I forgot the exact name/library). You can validate email address by calling one function, and it validates the email address according to definition in RFC 822.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
7

I'm still using:

^[A-Za-z0-9._+\-\']+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}$

But with IPv6 and Unicode coming up, perhaps this is best:

console.log(/^[\p{L}!#-'*+\-/\d=?^-~]+(.[\p{L}!#-'*+\-/\d=?^-~])*@[^@\s]{2,}$/u.test("תה.בועות@.fm"))

Gmail allows sequential dots, but Microsoft Exchange Server 2007 refuses them, which follows the most recent standard afaik.

Cees Timmerman
  • 17,623
  • 11
  • 91
  • 124
7

The regular expression for an email address is:

/^("(?:[!#-\[\]-\u{10FFFF}]|\\[\t -\u{10FFFF}])*"|[!#-'*+\-/-9=?A-Z\^-\u{10FFFF}](?:\.?[!#-'*+\-/-9=?A-Z\^-\u{10FFFF}])*)@([!#-'*+\-/-9=?A-Z\^-\u{10FFFF}](?:\.?[!#-'*+\-/-9=?A-Z\^-\u{10FFFF}])*|\[[!-Z\^-\u{10FFFF}]*\])$/u

This regular expression is 100% identical to the addr-spec ABNF for non-obsolete email addresses, as specified across RFC 5321, RFC 5322, and RFC 6532.

Additionally, you must verify:

  • The email address is well-formed UTF-8 (or ASCII, if you cannot send to internationalized email addresses)
  • The address is not more than 320 UTF-8 bytes
  • The user part (the first match group) is not more than 64 UTF-8 bytes
  • The domain part (the second match group) is not more than 255 UTF-8 bytes

The easiest way to do all of this is to use an existing function. In PHP, see the filter_var function using FILTER_VALIDATE_EMAIL and FILTER_FLAG_EMAIL_UNICODE (if you can send to internationalized email addresses):

$email_valid = filter_var($email_input, FILTER_VALIDATE_EMAIL, FILTER_FLAG_EMAIL_UNICODE);

However, maybe you're building such a function—indeed the easiest way to implement this is to use a regular expression.

Remember, this only verifies that the email address will not cause a syntax error. The only way to verify that the address can receive email is to actually send an email.

Next, I will treat how you generate this regular expression.


I write a new answer, because most of the answers here make the mistake of either specifying a pattern that is too restrictive (and so have not aged well); or they present a regular expression that's actually matching a header for a MIME message, and not the email address itself.

It is entirely possible to make a regular expression from an ABNF, so long as there are no recursive parts.

RFC 5322 specifies what is legal to send in a MIME message; consider this the upper bound on what is a legal email address.

However, to follow this ABNF exactly would be a mistake: this pattern technically represents how you encode an email address in a MIME message, and allows strings not part of the email address, like folding whitespace and comments; and it includes support for obsolete forms that are not legal to generate (but that servers read for historical reasons). An email address does not include these.

RFC 5322 explains:

Both atom and dot-atom are interpreted as a single unit, comprising the string of characters that make it up. Semantically, the optional comments and FWS surrounding the rest of the characters are not part of the atom; the atom is only the run of atext characters in an atom, or the atext and "." characters in a dot-atom.

In some of the definitions, there will be non-terminals whose names start with "obs-". These "obs-" elements refer to tokens defined in the obsolete syntax in section 4. In all cases, these productions are to be ignored for the purposes of generating legal Internet messages and MUST NOT be used as part of such a message.

If you remove CFWS, BWS, and obs-* rules from the addr-spec in RFC 5322, and perform some optimization on the result (I used "greenery"), you can produce this regular expression, quoted with slashes and anchored (suitable for use in ECMAScript and compatible dialects, with added newline for clarity):

/^("(?:[!#-\[\]-~]|\\[\t -~])*"|[!#-'*+\-/-9=?A-Z\^-~](?:\.?[!#-'*+\-/-9=?A-Z\^-~])*)
@([!#-'*+\-/-9=?A-Z\^-~](?:\.?[!#-'*+\-/-9=?A-Z\^-~])*|\[[!-Z\^-~]*\])$/

This only supports ASCII email addresses. To support RFC 6532 Internationalized Email Addresses, replace the ~ character with \u{10FFFF} (PHP, ECMAScript with the u flag), or \uFFFF (for UTF-16 implementations, like .NET and older ECMAScript/JavaScript):

/^("(?:[!#-\[\]-\u{10FFFF}]|\\[\t -\u{10FFFF}])*"|[!#-'*+\-/-9=?A-Z\^-\u{10FFFF}](?:\.?[!#-'*+\-/-9=?A-Z\^-\u{10FFFF}])*)@([!#-'*+\-/-9=?A-Z\^-\u{10FFFF}](?:\.?[!#-'*+\-/-9=?A-Z\^-\u{10FFFF}])*|\[[!-Z\^-\u{10FFFF}]*\])$/u

This works, because the ABNF we are using is not recursive, and so forms a non-recursive, regular grammar that can be converted into a regular expression.

It breaks down like so:

  • The user part (before the @) may be a dot-atom or a quoted-string
  • "([!#-\[\]-~]|\\[\t -~])*" specifies the quoted-string form of the user, e.g. "root@home"@example.com. It permits any non-control character inside double quotes; except that spaces, tabs, double quotes, and backslashes must be backslash-escaped.
  • [!#-'*+\-/-9=?A-Z\^-~] is the first character of the dot-atom of the user.
  • (\.?[!#-'*+\-/-9=?A-Z\^-~])* matches the rest of the dot-atom, allowing dots (except after another dot, or as the final character).
  • @ denotes the domain.
  • The domain part may be a dot-atom or a domain-literal.
  • [!#-'*+\-/-9=?A-Z\^-~](\.?[!#-'*+\-/-9=?A-Z\^-~])* is the same dot-atom form as above, but here it represents domain names and IPv4 addresses.
  • \[[!-Z\^-~]*\] will match IPv6 addresses and future definitions of host names.

This regular expression allows all specification-compliant email addresses, and can be used verbatim in a MIME message (except for line length limits, in which case folding whitespace must be added).

This also sets non-capturing groups such that match[1] will be the user, match[2] will be the host. (However if match[1] starts with a double quote, then filter out backslash escapes, and the start and end double quotes: "root"@example.com and root@example.com identify the same inbox.)

Finally, note that RFC 5321 sets limits on how long an email address may be. The user part may be up to 64 bytes, and the domain part may be up to 255 bytes. Including the @ character, the limit for the entire address is 320 bytes. This is measured in bytes after the address is UTF-8 encoded; not characters.

Note that RFC 5322 ABNF defines a permissive syntax for domain names, allowing names currently known to be invalid. This also allows for domain names that could become legal in the future. This should not be a problem, as this should be handled the same way a non-existent domain name is.

Always consider the possibility that a user typed in an email address that works, but that they do not have access to. The only foolproof way to verify an email address is to send an email.

This is adapted from my article E-Mail Addresses & Syntax.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
awwright
  • 565
  • 3
  • 8
  • 1
    I can use this in javascript, but can't get it formatted for C# use. I've tried putting it into regex101 website and it says its invalid – Post Impatica May 12 '21 at 21:52
  • 1
    @PostImpatica What is the error, exactly? Regex101 expects a regular expression that is slash-delimited. I don't know which dialect C# expects. If your dialect is slash-delimited, you'll need to escape the slashes with a backslash. – awwright May 17 '21 at 02:21
  • 1
    `"John Smith"@example.com` doesn't work with these on regexr.com – Cees Timmerman Nov 19 '22 at 00:32
  • 2
    @CeesTimmerman Spaces must be escaped in the quoted form, see https://www.rfc-editor.org/rfc/rfc5322#section-3.2.4 This is mentioned in my post: "It permits any non-control character inside double quotes; except that spaces, tabs, double quotes, and backslashes must be backslash-escaped." Note that whitespace is not considered "printing" in ASCII, see the VCHAR production in https://www.rfc-editor.org/rfc/rfc5234#appendix-B.1 – awwright Nov 20 '22 at 02:13
7
public bool ValidateEmail(string sEmail)
{
    if (sEmail == null)
    {
        return false;
    }

    int nFirstAT = sEmail.IndexOf('@');
    int nLastAT = sEmail.LastIndexOf('@');

    if ((nFirstAT > 0) && (nLastAT == nFirstAT) && (nFirstAT < (sEmail.Length - 1)))
    {
        return (Regex.IsMatch(sEmail, @"^[a-z|0-9|A-Z]*([_][a-z|0-9|A-Z]+)*([.][a-z|0-9|A-Z]+)*([.][a-z|0-9|A-Z]+)*(([_][a-z|0-9|A-Z]+)*)?@[a-z][a-z|0-9|A-Z]*\.([a-z][a-z|0-9|A-Z]*(\.[a-z][a-z|0-9|A-Z]*)?)$"));
    }
    else
    {
        return false;
    }
}
Jason Plank
  • 2,336
  • 5
  • 31
  • 40
  • This will sometimes fail; a user in an email address may contain "@" characters if they are inside a quoted-string. – awwright Sep 11 '20 at 05:18
6

I don't believe the claim made by bortzmeyer that "The grammar (specified in RFC 5322) is too complicated for that" (to be handled by a regular expression).

Here is the grammar (from 3.4.1. Addr-Spec Specification):

addr-spec       =   local-part "@" domain
local-part      =   dot-atom / quoted-string / obs-local-part
domain          =   dot-atom / domain-literal / obs-domain
domain-literal  =   [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
dtext           =   %d33-90 /          ; Printable US-ASCII
                    %d94-126 /         ;  characters not including
                    obs-dtext          ;  "[", "]", or "\"

Assuming that dot-atom, quoted-string, obs-local-part, obs-domain are themselves regular languages, this is a very simple grammar. Just replace the local-part and domain in the addr-spec production with their respective productions, and you have a regular language, directly translatable to a regular expression.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Dimitris Andreou
  • 8,806
  • 2
  • 33
  • 36
  • 5
    You should investigate CFWS before you start making assumptions here. It's a nightmare. – rjbs Dec 16 '09 at 19:07
  • CFWS = (1*([FWS] comment) [FWS]) / FWS. Still, I see no rule that makes the language not regular. It's complicated, for sure, but a complicated regular expression could handle it nevertheless. – Dimitris Andreou Jan 03 '10 at 21:53
  • 3
    This doesn't answer the question. It's in response to another answer. – Luna Dec 05 '16 at 20:17
  • CFWS is not part of the email address, it's part of the MIME syntax. See my answer https://stackoverflow.com/a/63841473/7117939 for why this is. – awwright Sep 12 '20 at 23:38
6

I know this question is about regular expressions, but I am guessing that 90% of all developers reading these solutions are trying to validate an email address in an HTML form displayed in a browser.

If this is the case, I'd suggest checking out the new HTML5 <input type="email"> form element:

HTML5:

 <input type="email" required />

CSS 3:

 input:required {
      background-color: rgba(255, 0, 0, 0.2);
 }

 input:focus:invalid {
     box-shadow: 0 0 1em red;
     border-color: red;
 }

 input:focus:valid {
     box-shadow: 0 0 1em green;
     border-color: green;
 }

It is at HTML5 Form Validation Without JS - JSFiddle - Code Playground.

This has a couple of advantages:

  1. Automatic validation and no custom solution needed: simple and easy to implement
  2. No JavaScript, and no problems if JavaScript has been disabled
  3. No server has to calculate anything for that
  4. The user has immediate feedback
  5. Old browsers should automatically fallback to input type "text"
  6. Mobile browsers can display a specialized keyboard (@-Keyboard)
  7. Form validation feedback is very easy with CSS 3

The apparent downside might be missing validation for old browsers, but that'll change over time. I'd prefer this over any of these insane regular expression masterpieces.

Also see:

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
auco
  • 9,329
  • 4
  • 47
  • 54
  • The other down side is that this is client-side only. Good for providing a smooth user experience, bad for validating data. – acrosman Jan 21 '14 at 21:44
  • The problem with the default email validation is that it has [lots of false positives](http://jsbin.com/wizop/1/). You'd need to use [my complete pattern](http://stackoverflow.com/a/24092435/1256925) to eliminate all false positives while preventing false negatives from sneaking in. That pattern can be added [via the `pattern` attribute](http://jsbin.com/wizop/2/). See [my post](http://stackoverflow.com/a/24092435/1256925) for more info. – Joeytje50 Jun 07 '14 at 01:51
6

I use multi-step validation. As there isn't any perfect way to validate an email address, a perfect one can't be made, but at least you can notify the user he/she is doing something wrong - here is my approach:

  1. I first validate with the very basic regex which just checks if the email contains exactly one @ sign and it is not blank before or after that sign. e.g. /^[^@\s]+@[^@\s]+$/

  2. if the first validator does not pass (and for most addresses it should although it is not perfect), then warn the user the email is invalid and do not allow him/her to continue with the input

  3. if it passes, then validate against a more strict regex - something which might disallow valid emails. If it does not pass, the user is warned about a possible error, but the user is allowed to continue. Unlike step (1) where the user is not allowed to continue because it is an obvious error.

So in other words, the first liberal validation is just to strip obvious errors and it is treated as "error". People type a blank address, address without @ sign and so on. This should be treated as an error. The second one is more strict, but it is treated as a "warning" and the user is allowed to continue with the input, but warned to at least examine if he/she entered a valid entry. The key here is in the error/warning approach - the error being something that can't under 99.99% circumstances be a valid email.

Of course, you can adjust what makes the first regex more liberal and the second one more strict.

Depending on what you need, the above approach might work for you.

Suhaib Janjua
  • 3,538
  • 16
  • 59
  • 73
Coder12345
  • 3,431
  • 3
  • 33
  • 73
  • Technically, email can contain more than 1 @. It's an astonishing weird discovery i made recently. EG: "very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com – Allan Deamon Apr 18 '22 at 18:57
  • 1
    Agreed, but I never claimed my method is 100% foolproof. It works in most cases. You gotta be realistic at some point and discard very unlikely cases. Most email addresses are something@something.something. If someone actually chooses to use an email address which is uses the most liberal syntax of all, he/she is in for a real treat of issues with various server/client programs not properly validating or allowing such email, or simply not working at all while sending/receiving. Where then such a user would be forced to use more "standard" syntax to ensure it works everywhere. – Coder12345 Apr 18 '22 at 23:16
5

For me the right way for checking email addresses is:

  1. Check that symbol @ exists, and before and after it there are some non-@ symbols: /^[^@]+@[^@]+$/
  2. Try to send an email to this address with some "activation code".
  3. When the user "activated" his/her email address, we will see that all is right.

Of course, you can show some warning or tooltip in front-end when the user typed a "strange" email to help him/her to avoid common mistakes, like no dot in the domain part or spaces in name without quoting and so on. But you must accept the address "hello@world" if user really want it.

Also, you must remember that the email address standard was and can evolve, so you can't just type some "standard-valid" regexp once and for all times. And you must remember that some concrete internet servers can fail some details of common standard and in fact work with own "modified standard".

So, just check @, hint user on frontend and send verification emails on the given address.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
FlameStorm
  • 944
  • 15
  • 20
5

This rule matches what our Postfix server could not send to.

Allow letters, numbers, -, _, +, ., &, /, and !

No -foo@bar.com

No asd@-bar.com

/^([a-z0-9\+\._\/&!][-a-z0-9\+\._\/&!]*)@(([a-z0-9][-a-z0-9]*\.)([-a-z0-9]+\.)*[a-z]{2,})$/i
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
grosser
  • 14,707
  • 7
  • 57
  • 61
4

This is one of the regexes for email:

^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Nazmul Hasan
  • 6,840
  • 13
  • 36
  • 37
4

No one mentioned the issue of localization (i18n). What if you have clients coming from all over the world?

You will need to then need to sub-categorize your regex per country/area, which I have seen developers ending up building a large dictionary or configuration. Detecting the users' browser language setting may be a good starting point.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jay Zeng
  • 1,423
  • 10
  • 10
4

I always use the below regular expression to validate the email address. It covers all formats of email addresses based on English language characters.

"\A(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z";

Given below is a C# example:

Add the assembly reference:

using System.Text.RegularExpressions;

and use the below method to pass the email address and get a boolean in return

private bool IsValidEmail(string email) {
    bool isValid = false;
    const string pattern = @"\A(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z";

    isValid = email != "" && Regex.IsMatch(email, pattern);

    // Same above approach in multiple lines
    //
    //if (!email) {
    //    isValid = false;
    //} else {
    //    // email param contains a value; Pass it to the isMatch method
    //    isValid = Regex.IsMatch(email, pattern);
    //}
    return isValid;
}

This method validates the email string passed in the parameter. It will return false for all cases where param is null, empty string, undefined or the param value is not a valid email address. It will only return true when the param contains a valid email address string.

Suhaib Janjua
  • 3,538
  • 16
  • 59
  • 73
  • 2
    Does this code accept "Håkan.Söderström@malmö.se" or "试@例子.测试.مثال.آزمایشی" emails? – Ivan Z Mar 27 '14 at 23:07
  • 3
    It's for standard Email Servers with standard characters. In case of non English language one should have to make its own customized ReGex. – Suhaib Janjua Mar 28 '14 at 05:55
  • Regex and email spec includes UTF-8, hence illogical response. – rob2d Nov 30 '19 at 03:05
  • 1
    In what way is it the best regular expression? Most comprehensive? Simplest? Fewest false negatives? Fewest false positives? The fastest? Fewest number of user complaints in actual real-world use? Some combination of these properties? Something else? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/21595782/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Feb 14 '22 at 21:05
4

We have used http://www.aspnetmx.com/ with a degree of success for a few years now. You can choose the level you want to validate at (e.g. syntax check, check for the domain, MX records or the actual email).

For front-end forms we generally verify that the domain exists and the syntax is correct, and then we do stricter verification to clean out our database before doing bulk mail-outs.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
cbp
  • 25,252
  • 29
  • 125
  • 205
4

For PHP I'm using the email address validator from the Nette Framework:

/* public static */ function isEmail($value)
{
    $atom = "[-a-z0-9!#$%&'*+/=?^_`{|}~]"; // RFC 5322 unquoted characters in local-part
    $localPart = "(?:\"(?:[ !\\x23-\\x5B\\x5D-\\x7E]*|\\\\[ -~])+\"|$atom+(?:\\.$atom+)*)"; // Quoted or unquoted
    $alpha = "a-z\x80-\xFF"; // Superset of IDN
    $domain = "[0-9$alpha](?:[-0-9$alpha]{0,61}[0-9$alpha])?"; // RFC 1034 one domain component
    $topDomain = "[$alpha](?:[-0-9$alpha]{0,17}[$alpha])?";
    return (bool) preg_match("(^$localPart@(?:$domain\\.)+$topDomain\\z)i", $value);
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ondřej Šotek
  • 1,793
  • 1
  • 15
  • 24
4

Just about every regular expression I've seen - including some used by Microsoft will not allow the following valid email to get through: simon-@hotmail.com

I just had a real customer with an email address in this format who couldn't place an order.

Here's what I settled on:

  • A minimal regular expression that won't have false negatives. Alternatively use the MailAddress constructor with some additional checks (see below):
  • Checking for common typos .cmo or .gmial.com and asking for confirmation "Are you sure this is your correct email address. It looks like there may be a mistake." Allow the user to accept what they typed if they are sure.
  • Handling bounces when the email is actually sent and manually verifying them to check for obvious mistakes.

try
{
    var email = new MailAddress(str);

    if (email.Host.EndsWith(".cmo"))
    {
        return EmailValidation.PossibleTypo;
    }

    if (!email.Host.EndsWith(".") && email.Host.Contains("."))
    {
        return EmailValidation.OK;
    }
}
catch
{
    return EmailValidation.Invalid;
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Simon_Weaver
  • 140,023
  • 84
  • 646
  • 689
4

According to RFC 2821 and RFC 2822, the local-part of an email addresses may use any of these ASCII characters:

  1. Uppercase and lowercase letters
  2. The digits 0 through 9
  3. The characters, !#$%&'*+-/=?^_`{|}~
  4. The character "." provided that it is not the first or last character in the local-part.

Matches:

  • a&d@somedomain.com
  • a*d@somedomain.com
  • a/d@somedomain.com

Non-Matches:

  • .abc@somedomain.com
  • abc.@somedomain.com
  • a>b@somedomain.com

For one that is RFC 2821 and 2822 compliant, you can use:

^((([!#$%&'*+\-/=?^_`{|}~\w])|([!#$%&'*+\-/=?^_`{|}~\w][!#$%&'*+\-/=?^_`{|}~\.\w]{0,}[!#$%&'*+\-/=?^_`{|}~\w]))[@]\w+([-.]\w+)*\.\w+([-.]\w+)*)$

Email - RFC 2821, 2822 Compliant

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Dave Black
  • 7,305
  • 2
  • 52
  • 41
4

Although very detailed answers are already added, I think those are complex enough for a developer who is just looking for a simple method to validate an email address or to get all email addresses from a string in Java.

public static boolean isEmailValid(@NonNull String email) {
    return android.util.Patterns.EMAIL_ADDRESS.matcher(email).matches();
}

As per the regular expression is concerned, I always use this regular expression, which works for my problems.

"[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}"

If you are looking to find all email addresses from a string by matching the email regular expression. You can find a method at this link.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Asad Ali Choudhry
  • 4,985
  • 4
  • 31
  • 36
  • Re *"which works for my problems."*: What would those problems be? What are some examples of false positives and false negatives? How do you handle those? – Peter Mortensen Feb 15 '22 at 00:12
  • What programming language? [Java](https://en.wikipedia.org/wiki/Java_%28programming_language%29)? This was comment number 2 and question number 2. – Peter Mortensen Feb 15 '22 at 00:14
3

Here is the one I've build. It is not a bulletproof version, but it is 'simple' and checks almost everything.

[\w+-]+(?:\.[\w+-]+)*@[\w+-]+(?:\.[\w+-]+)*(?:\.[a-zA-Z]{2,4})

I think an explanation is in place so you can modify it if you want:

(e) [\w+-]+ matches a-z, A-Z, _, +, - at least one time

(m) (?:\.[\w+-]+)* matches a-z, A-Z, _, +, - zero or more times but need to start with a . (dot)

@ = @

(i) [\w+-]+ matches a-z, A-Z, _, +, - at least one time

(l) (?:\.[\w+-]+)* matches a-z, A-Z, _, +, - zero or more times but need to start with a . (dot)

(com) (?:\.[a-zA-Z]{2,4}) matches a-z, A-Z for 2 to 4 times starting with a . (dot)

giving e(.m)@i(.l).com where (.m) and (.l) are optional but also can be repeated multiple times.

I think this validates all valid email addresses, but blocks potential invalid without using an overcomplex regular expression which won't be necessary in most cases.

Notice this will allow +@-.com, but that is the compromise for keeping it simple.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
FLY
  • 2,379
  • 5
  • 29
  • 40
  • Thanks! This worked for me. Here is a tested C/C++ escaped version used with Qt5: QRegExp rx("[\\w+-]+(?:\\.[\\w+-]+)*@[\\w+-]+(?:\\.[\\w+-]+)*(?:\\.[a-zA-Z]{2,})"); – Mr. Developerdude Jun 17 '13 at 13:15
3

I’ve had a similar desire: wanting a quick check for syntax in email addresses without going overboard (the Mail::RFC822::Address answer which is the obviously correct one) for an email send utility. I went with this (I’m a POSIX regular expression person, so I don’t normally use \d and such from PCRE, as they make things less legible to me):

preg_match("_^[-!#-'*+/-9=?A-Z^-~]+(\.[-!#-'*+/-9=?A-Z^-~]+)*@[0-9A-Za-z]([-0-9A-Za-z]{0,61}[0-9A-Za-z])?(\.[0-9A-Za-z]([-0-9A-Za-z]{0,61}[0-9A-Za-z])?)*\$_", $adr)

This is RFC-correct, but it explicitly excludes the obsolete forms as well as direct IP addresses (IP addresses and legacy IP addresses both), which someone in the target group of that utility (mostly: people who bother us in #sendmail on IRC) would not normally want or need anyway.

IDNs (internationalised domain names) are explicitly not in the scope of email: addresses like “foo@cäcilienchor-bonn.de” must be written “foo@xn--ccilienchor-bonn-vnb.de” on the wire instead (this includes mailto: links in HTML and such fun), only the GUI is allowed to display (and accept then convert) such names to (and from) the user.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
mirabilos
  • 5,123
  • 2
  • 46
  • 72
  • Re *"legacy IP addresses"*: Do you mean *[IPv4](https://en.wikipedia.org/wiki/IPv4) IP addresses*? – Peter Mortensen Feb 13 '22 at 18:14
  • @PeterMortensen: (thanks for the syntax highlighting and English fixes, but something seems to be broken now, it says community wiki with you as author?) yes, legacy IP addresses is what IPv4 addresses have been called for a couple of years now, IP addresses are IPv6 addresses. – mirabilos Feb 14 '22 at 14:46
3

If you want to improve on a regex that has been working reasonably well over several years, then the answer depends on what exactly you want to achieve - what kinds of email addresses have been failing. Fine-tuning email regexes is very difficult, and I have yet to see a perfect solution.

  • If your application involves something very technical in nature (or something internal to organizations), then maybe you need to support IP addresses instead of domain names, or comments in the "local" part of the email address.
  • If your application is multinational, I would consider focusing on Unicode and UTF-8 support.

The leading answer to your question currently links to a "fully RFC‑822–compliant regex". However, in spite of the complexity of that regex and its presumed attention to detail in RFC rules, it completely fails when it comes to Unicode support.

The regex that I've written for most of my applications focuses on Unicode support, as well as reasonably good overall adherence to RFC standards:

/^(?!\.)((?!.*\.{2})[a-zA-Z0-9\u0080-\u00FF\u0100-\u017F\u0180-\u024F\u0250-\u02AF\u0300-\u036F\u0370-\u03FF\u0400-\u04FF\u0500-\u052F\u0530-\u058F\u0590-\u05FF\u0600-\u06FF\u0700-\u074F\u0750-\u077F\u0780-\u07BF\u07C0-\u07FF\u0900-\u097F\u0980-\u09FF\u0A00-\u0A7F\u0A80-\u0AFF\u0B00-\u0B7F\u0B80-\u0BFF\u0C00-\u0C7F\u0C80-\u0CFF\u0D00-\u0D7F\u0D80-\u0DFF\u0E00-\u0E7F\u0E80-\u0EFF\u0F00-\u0FFF\u1000-\u109F\u10A0-\u10FF\u1100-\u11FF\u1200-\u137F\u1380-\u139F\u13A0-\u13FF\u1400-\u167F\u1680-\u169F\u16A0-\u16FF\u1700-\u171F\u1720-\u173F\u1740-\u175F\u1760-\u177F\u1780-\u17FF\u1800-\u18AF\u1900-\u194F\u1950-\u197F\u1980-\u19DF\u19E0-\u19FF\u1A00-\u1A1F\u1B00-\u1B7F\u1D00-\u1D7F\u1D80-\u1DBF\u1DC0-\u1DFF\u1E00-\u1EFF\u1F00-\u1FFFu20D0-\u20FF\u2100-\u214F\u2C00-\u2C5F\u2C60-\u2C7F\u2C80-\u2CFF\u2D00-\u2D2F\u2D30-\u2D7F\u2D80-\u2DDF\u2F00-\u2FDF\u2FF0-\u2FFF\u3040-\u309F\u30A0-\u30FF\u3100-\u312F\u3130-\u318F\u3190-\u319F\u31C0-\u31EF\u31F0-\u31FF\u3200-\u32FF\u3300-\u33FF\u3400-\u4DBF\u4DC0-\u4DFF\u4E00-\u9FFF\uA000-\uA48F\uA490-\uA4CF\uA700-\uA71F\uA800-\uA82F\uA840-\uA87F\uAC00-\uD7AF\uF900-\uFAFF\.!#$%&'*+-/=?^_`{|}~\-\d]+)@(?!\.)([a-zA-Z0-9\u0080-\u00FF\u0100-\u017F\u0180-\u024F\u0250-\u02AF\u0300-\u036F\u0370-\u03FF\u0400-\u04FF\u0500-\u052F\u0530-\u058F\u0590-\u05FF\u0600-\u06FF\u0700-\u074F\u0750-\u077F\u0780-\u07BF\u07C0-\u07FF\u0900-\u097F\u0980-\u09FF\u0A00-\u0A7F\u0A80-\u0AFF\u0B00-\u0B7F\u0B80-\u0BFF\u0C00-\u0C7F\u0C80-\u0CFF\u0D00-\u0D7F\u0D80-\u0DFF\u0E00-\u0E7F\u0E80-\u0EFF\u0F00-\u0FFF\u1000-\u109F\u10A0-\u10FF\u1100-\u11FF\u1200-\u137F\u1380-\u139F\u13A0-\u13FF\u1400-\u167F\u1680-\u169F\u16A0-\u16FF\u1700-\u171F\u1720-\u173F\u1740-\u175F\u1760-\u177F\u1780-\u17FF\u1800-\u18AF\u1900-\u194F\u1950-\u197F\u1980-\u19DF\u19E0-\u19FF\u1A00-\u1A1F\u1B00-\u1B7F\u1D00-\u1D7F\u1D80-\u1DBF\u1DC0-\u1DFF\u1E00-\u1EFF\u1F00-\u1FFF\u20D0-\u20FF\u2100-\u214F\u2C00-\u2C5F\u2C60-\u2C7F\u2C80-\u2CFF\u2D00-\u2D2F\u2D30-\u2D7F\u2D80-\u2DDF\u2F00-\u2FDF\u2FF0-\u2FFF\u3040-\u309F\u30A0-\u30FF\u3100-\u312F\u3130-\u318F\u3190-\u319F\u31C0-\u31EF\u31F0-\u31FF\u3200-\u32FF\u3300-\u33FF\u3400-\u4DBF\u4DC0-\u4DFF\u4E00-\u9FFF\uA000-\uA48F\uA490-\uA4CF\uA700-\uA71F\uA800-\uA82F\uA840-\uA87F\uAC00-\uD7AF\uF900-\uFAFF\-\.\d]+)((\.([a-zA-Z\u0080-\u00FF\u0100-\u017F\u0180-\u024F\u0250-\u02AF\u0300-\u036F\u0370-\u03FF\u0400-\u04FF\u0500-\u052F\u0530-\u058F\u0590-\u05FF\u0600-\u06FF\u0700-\u074F\u0750-\u077F\u0780-\u07BF\u07C0-\u07FF\u0900-\u097F\u0980-\u09FF\u0A00-\u0A7F\u0A80-\u0AFF\u0B00-\u0B7F\u0B80-\u0BFF\u0C00-\u0C7F\u0C80-\u0CFF\u0D00-\u0D7F\u0D80-\u0DFF\u0E00-\u0E7F\u0E80-\u0EFF\u0F00-\u0FFF\u1000-\u109F\u10A0-\u10FF\u1100-\u11FF\u1200-\u137F\u1380-\u139F\u13A0-\u13FF\u1400-\u167F\u1680-\u169F\u16A0-\u16FF\u1700-\u171F\u1720-\u173F\u1740-\u175F\u1760-\u177F\u1780-\u17FF\u1800-\u18AF\u1900-\u194F\u1950-\u197F\u1980-\u19DF\u19E0-\u19FF\u1A00-\u1A1F\u1B00-\u1B7F\u1D00-\u1D7F\u1D80-\u1DBF\u1DC0-\u1DFF\u1E00-\u1EFF\u1F00-\u1FFF\u20D0-\u20FF\u2100-\u214F\u2C00-\u2C5F\u2C60-\u2C7F\u2C80-\u2CFF\u2D00-\u2D2F\u2D30-\u2D7F\u2D80-\u2DDF\u2F00-\u2FDF\u2FF0-\u2FFF\u3040-\u309F\u30A0-\u30FF\u3100-\u312F\u3130-\u318F\u3190-\u319F\u31C0-\u31EF\u31F0-\u31FF\u3200-\u32FF\u3300-\u33FF\u3400-\u4DBF\u4DC0-\u4DFF\u4E00-\u9FFF\uA000-\uA48F\uA490-\uA4CF\uA700-\uA71F\uA800-\uA82F\uA840-\uA87F\uAC00-\uD7AF\uF900-\uFAFF]){2,63})+)$/i

I'll avoid copy-pasting complete answers, so I'll just link this to a similar answer I provided here: How to validate a unicode email?

There is also a live demo available for the regex above at: http://jsfiddle.net/aossikine/qCLVH/3/

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Alexey Ossikine
  • 103
  • 1
  • 5
3

The regular expressions posted for this question are out of date now, because of the new generic top-level domains (gTLDs) coming in (e.g. .london, .basketball, .通販). To validate an email address there are two answers (that would be relevant to the vast majority).

  1. As the main answer says - don't use a regular expression. Just validate it by sending an email to the address (catch exceptions for invalid addresses)
  2. Use a very generic regex to at least make sure that they are using an email structure like {something}@{something}.{something}. There's no point in going for a detailed regex, because you won't catch them all and there'll be a new batch in a few years and you'll have to update your regular expression again.

I have decided to use the regular expression because, unfortunately, some users don't read forms and put the wrong data in the wrong fields. This will at least alert them when they try to put something which isn't an email into the email input field and should save you some time supporting users on email issues.

(.+)@(.+){2,}\.(.+){2,}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
McGaz
  • 1,354
  • 1
  • 13
  • 22
  • What is the difference between a gTLD and a [TLD](https://en.wikipedia.org/wiki/Top-level_domain)? – Peter Mortensen Feb 14 '22 at 21:59
  • They're all the same really, but just categorised differently. There are mainly Country Code TLDS (ccTLD), like .co.uk or .fr. These are assigned to each country and contribute as a factor for search engines understand the location/target audience. Sponsored TLDS (sTLD) are assigned to organisations or governments, e.g. .gov The generics (gTLD) cover the extensions which are generic, e.g. .com, .london, .mail, etc. There are some restrictions on which ones you can use, prices can be very different, but Google also says it doesn't matter too much whether you're on a .com or a .whatever. – McGaz Feb 16 '22 at 11:26
3

Following is the regular expression for validating an email address:

^.+@\w+(\.\w+)+$
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Prasad Bhosale
  • 682
  • 8
  • 20
  • Given all the previous answers, such a simple regular expression requires an explanation (e.g., why weren't the huge complexity in the previous answers necessary?). What are its properties? What does it fail for? What are some examples that it does work for? What are some examples that it doesn't work for? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/26800017/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Feb 14 '22 at 22:55
3

If you need a simple form to validate, you can use the answer of https://regexr.com/3e48o

^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$

let r = new RegExp(String.raw `^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$`);

//should be true
console.log(r.test('name@domain.tld'));
console.log(r.test('name@domain.co.tld'));
console.log(r.test('name@domain.co'));

//should be false
console.log(r.test('@domain.tld'));
console.log(r.test('name@.tld'));
console.log(r.test('name@domain.'));
console.log(r.test('namedomain.tld'));
console.log(r.test(''));

//now that basic client-side validation is done, send a token from the server side to validate the user actually has access to the email
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • 6
    This regex is too simple and rejects ordinary valid emails. It incorrectly rejects the plus particle in the local part (`foo+bar@example.com`) and incorrectly rejects generic top-level domains with more than four letters (`foo@example.dance`). – doppelgreener Apr 15 '21 at 13:50
  • This fails over validating `.academy` domains, for example – Pavindu Apr 23 '22 at 12:55
3

I would not suggest to use an regex at all - email addresses are way too complicated for that. This is a common problem so I would guess there are many libraries that contain a validator - if you use Java the EmailValidator of apache commons validator is a good one.

Dr. Hans-Peter Störr
  • 25,298
  • 30
  • 102
  • 139
2

As per my understanding, it will most probably be covered by...

/^([a-z0-9_-]+)(@[a-z0-9-]+)(\.[a-z]+|\.[a-z]+\.[a-z]+)?$/is
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Mohit Gupta
  • 11
  • 1
  • 1
  • improvement/suggestion always act as catalyst so pls be catalyzed and catalyzed me also. – Mohit Gupta Dec 31 '12 at 13:11
  • Gmail users often use . and + in their email nick, and some comments on this page mention ' and !. – Cees Timmerman Jan 18 '13 at 14:42
  • 1
    This is too restrictive, and does not permit numbers in domain names, characters in the user part. `o'hare@example.com`, `me+alpha@example.com`, and `me@mail.s2.example.com` are all valid email addresses that this does not validate. – awwright Sep 15 '20 at 02:48
2

I found a regular expression that is compliant with RFC 2822. The preceding standard to RFC 5322. This regular expression appears to perform fairly well and will cover most cases, however with RFC 5322 becoming the standard there may be some holes that ought to be plugged.

^(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])$

The documentation says you shouldn't use the above regular expression, but instead favour this flavour, which is a bit more manageable.

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

I noticed this is case-sensitive, so I actually made an alteration to this landing.

^[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?$
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
mrswadge
  • 1,659
  • 1
  • 20
  • 43
2

A regex that does exactly what the standards say is allowed, according to what I've seen about them, is this:

/^(?!(^[.-].*|.*[.-]@|.*\.{2,}.*)|^.{254}.+@)([a-z\xC0-\xFF0-9!#$%&'*+\/=?^_`{|}~.-]+@)(?!.{253}.+$)((?!-.*|.*-\.)([a-z0-9-]{1,63}\.)+[a-z]{2,63}|(([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])\.){3}([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9]))$/gim

Demo / Debuggex analysis (interactive)

Split up:

^(?!(^[.-].*|.*[.-]@|.*\.{2,}.*)|^.{254}.+@)
([a-z\xC0-\xFF0-9!#$%&'*+\/=?^_`{|}~.-]+@)
(?!.{253}.+$)
(
    (?!-.*|.*-\.)
    ([a-z0-9-]{1,63}\.)+
    [a-z]{2,63}
    |
    (([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])\.){3}
    ([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])
)$

Analysis:

(?!(^[.-].*|.*[.-]@|.*\.{2,}.*)|^.{254}.+@)

Negative lookahead for either an address starting with a ., ending with one, having .. in it, or exceeding the 254 character max length


([a-z\xC0-\xFF0-9!#$%&'*+\/=?^_`{|}~.-]+@)

matching 1 or more of the permitted characters, with the negative look applying to it


(?!.{253}.+$)

Negative lookahead for the domain name part, restricting it to 253 characters in total


(?!-.*|.*-\.)

Negative lookahead for each of the domain names, which are don't allow starting or ending with .


([a-z0-9-]{1,63}\.)+

simple group match for the allowed characters in a domain name, which are limited to 63 characters each


[a-zA-Z]{2,63}

simple group match for the allowed top-level domain, which currently still is restricted to letters only, but does include >4 letter TLDs.


(([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])\.){3}
([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])

the alternative for domain names: this matches the first 3 numbers in an IP address with a . behind it, and then the fourth number in the IP address without . behind it.

Community
  • 1
  • 1
Joeytje50
  • 18,636
  • 15
  • 63
  • 95
  • 3
    Don't use this. It's will reject international domains like "öåüñ". https://blog.cloudflare.com/non-latinutf8-domains-now-fully-supported/ – Albin Mar 04 '18 at 17:23
2

There has nearly been added a new domain, "yandex". Possible emails: test@job.yandex. And also uppercase letters are supported, so a bit modified version of acrosman's solution is:

^[_a-zA-Z0-9-]+(\.[_a-zA-Z0-9-]+)*@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*(\.[a-zA-Z]{2,6})$
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Fragment
  • 1,555
  • 1
  • 26
  • 33
2

Java Mail API does magic for us.

try
{
    InternetAddress internetAddress = new InternetAddress(email);
    internetAddress.validate();
    return true;
}
catch(Exception ex)
{
    return false;
}

I got this from here.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
sunleo
  • 10,589
  • 35
  • 116
  • 196
2

I did not find any that deals with a top-level domain name, but it should be considered.

So for me the following worked:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}AAA|AARP|ABB|ABBOTT|ABOGADO|AC|ACADEMY|ACCENTURE|ACCOUNTANT|ACCOUNTANTS|ACO|ACTIVE|ACTOR|AD|ADAC|ADS|ADULT|AE|AEG|AERO|AF|AFL|AG|AGENCY|AI|AIG|AIRFORCE|AIRTEL|AL|ALIBABA|ALIPAY|ALLFINANZ|ALSACE|AM|AMICA|AMSTERDAM|ANALYTICS|ANDROID|AO|APARTMENTS|APP|APPLE|AQ|AQUARELLE|AR|ARAMCO|ARCHI|ARMY|ARPA|ARTE|AS|ASIA|ASSOCIATES|AT|ATTORNEY|AU|AUCTION|AUDI|AUDIO|AUTHOR|AUTO|AUTOS|AW|AX|AXA|AZ|AZURE|BA|BAIDU|BAND|BANK|BAR|BARCELONA|BARCLAYCARD|BARCLAYS|BARGAINS|BAUHAUS|BAYERN|BB|BBC|BBVA|BCN|BD|BE|BEATS|BEER|BENTLEY|BERLIN|BEST|BET|BF|BG|BH|BHARTI|BI|BIBLE|BID|BIKE|BING|BINGO|BIO|BIZ|BJ|BLACK|BLACKFRIDAY|BLOOMBERG|BLUE|BM|BMS|BMW|BN|BNL|BNPPARIBAS|BO|BOATS|BOEHRINGER|BOM|BOND|BOO|BOOK|BOOTS|BOSCH|BOSTIK|BOT|BOUTIQUE|BR|BRADESCO|BRIDGESTONE|BROADWAY|BROKER|BROTHER|BRUSSELS|BS|BT|BUDAPEST|BUGATTI|BUILD|BUILDERS|BUSINESS|BUY|BUZZ|BV|BW|BY|BZ|BZH|CA|CAB|CAFE|CAL|CALL|CAMERA|CAMP|CANCERRESEARCH|CANON|CAPETOWN|CAPITAL|CAR|CARAVAN|CARDS|CARE|CAREER|CAREERS|CARS|CARTIER|CASA|CASH|CASINO|CAT|CATERING|CBA|CBN|CC|CD|CEB|CENTER|CEO|CERN|CF|CFA|CFD|CG|CH|CHANEL|CHANNEL|CHAT|CHEAP|CHLOE|CHRISTMAS|CHROME|CHURCH|CI|CIPRIANI|CIRCLE|CISCO|CITIC|CITY|CITYEATS|CK|CL|CLAIMS|CLEANING|CLICK|CLINIC|CLINIQUE|CLOTHING|CLOUD|CLUB|CLUBMED|CM|CN|CO|COACH|CODES|COFFEE|COLLEGE|COLOGNE|COM|COMMBANK|COMMUNITY|COMPANY|COMPARE|COMPUTER|COMSEC|CONDOS|CONSTRUCTION|CONSULTING|CONTACT|CONTRACTORS|COOKING|COOL|COOP|CORSICA|COUNTRY|COUPONS|COURSES|CR|CREDIT|CREDITCARD|CREDITUNION|CRICKET|CROWN|CRS|CRUISES|CSC|CU|CUISINELLA|CV|CW|CX|CY|CYMRU|CYOU|CZ|DABUR|DAD|DANCE|DATE|DATING|DATSUN|DAY|DCLK|DE|DEALER|DEALS|DEGREE|DELIVERY|DELL|DELTA|DEMOCRAT|DENTAL|DENTIST|DESI|DESIGN|DEV|DIAMONDS|DIET|DIGITAL|DIRECT|DIRECTORY|DISCOUNT|DJ|DK|DM|DNP|DO|DOCS|DOG|DOHA|DOMAINS|DOOSAN|DOWNLOAD|DRIVE|DUBAI|DURBAN|DVAG|DZ|EARTH|EAT|EC|EDEKA|EDU|EDUCATION|EE|EG|EMAIL|EMERCK|ENERGY|ENGINEER|ENGINEERING|ENTERPRISES|EPSON|EQUIPMENT|ER|ERNI|ES|ESQ|ESTATE|ET|EU|EUROVISION|EUS|EVENTS|EVERBANK|EXCHANGE|EXPERT|EXPOSED|EXPRESS|FAGE|FAIL|FAIRWINDS|FAITH|FAMILY|FAN|FANS|FARM|FASHION|FAST|FEEDBACK|FERRERO|FI|FILM|FINAL|FINANCE|FINANCIAL|FIRESTONE|FIRMDALE|FISH|FISHING|FIT|FITNESS|FJ|FK|FLIGHTS|FLORIST|FLOWERS|FLSMIDTH|FLY|FM|FO|FOO|FOOTBALL|FORD|FOREX|FORSALE|FORUM|FOUNDATION|FOX|FR|FRESENIUS|FRL|FROGANS|FUND|FURNITURE|FUTBOL|FYI|GA|GAL|GALLERY|GAME|GARDEN|GB|GBIZ|GD|GDN|GE|GEA|GENT|GENTING|GF|GG|GGEE|GH|GI|GIFT|GIFTS|GIVES|GIVING|GL|GLASS|GLE|GLOBAL|GLOBO|GM|GMAIL|GMO|GMX|GN|GOLD|GOLDPOINT|GOLF|GOO|GOOG|GOOGLE|GOP|GOT|GOV|GP|GQ|GR|GRAINGER|GRAPHICS|GRATIS|GREEN|GRIPE|GROUP|GS|GT|GU|GUCCI|GUGE|GUIDE|GUITARS|GURU|GW|GY|HAMBURG|HANGOUT|HAUS|HEALTH|HEALTHCARE|HELP|HELSINKI|HERE|HERMES|HIPHOP|HITACHI|HIV|HK|HM|HN|HOCKEY|HOLDINGS|HOLIDAY|HOMEDEPOT|HOMES|HONDA|HORSE|HOST|HOSTING|HOTELES|HOTMAIL|HOUSE|HOW|HR|HSBC|HT|HU|HYUNDAI|IBM|ICBC|ICE|ICU|ID|IE|IFM|IINET|IL|IM|IMMO|IMMOBILIEN|IN|INDUSTRIES|INFINITI|INFO|ING|INK|INSTITUTE|INSURANCE|INSURE|INT|INTERNATIONAL|INVESTMENTS|IO|IPIRANGA|IQ|IR|IRISH|IS|ISELECT|IST|ISTANBUL|IT|ITAU|IWC|JAGUAR|JAVA|JCB|JE|JETZT|JEWELRY|JLC|JLL|JM|JMP|JO|JOBS|JOBURG|JOT|JOY|JP|JPRS|JUEGOS|KAUFEN|KDDI|KE|KFH|KG|KH|KI|KIA|KIM|KINDER|KITCHEN|KIWI|KM|KN|KOELN|KOMATSU|KP|KPN|KR|KRD|KRED|KW|KY|KYOTO|KZ|LA|LACAIXA|LAMBORGHINI|LAMER|LANCASTER|LAND|LANDROVER|LANXESS|LASALLE|LAT|LATROBE|LAW|LAWYER|LB|LC|LDS|LEASE|LECLERC|LEGAL|LEXUS|LGBT|LI|LIAISON|LIDL|LIFE|LIFEINSURANCE|LIFESTYLE|LIGHTING|LIKE|LIMITED|LIMO|LINCOLN|LINDE|LINK|LIVE|LIVING|LIXIL|LK|LOAN|LOANS|LOL|LONDON|LOTTE|LOTTO|LOVE|LR|LS|LT|LTD|LTDA|LU|LUPIN|LUXE|LUXURY|LV|LY|MA|MADRID|MAIF|MAISON|MAKEUP|MAN|MANAGEMENT|MANGO|MARKET|MARKETING|MARKETS|MARRIOTT|MBA|MC|MD|ME|MED|MEDIA|MEET|MELBOURNE|MEME|MEMORIAL|MEN|MENU|MEO|MG|MH|MIAMI|MICROSOFT|MIL|MINI|MK|ML|MM|MMA|MN|MO|MOBI|MOBILY|MODA|MOE|MOI|MOM|MONASH|MONEY|MONTBLANC|MORMON|MORTGAGE|MOSCOW|MOTORCYCLES|MOV|MOVIE|MOVISTAR|MP|MQ|MR|MS|MT|MTN|MTPC|MTR|MU|MUSEUM|MUTUELLE|MV|MW|MX|MY|MZ|NA|NADEX|NAGOYA|NAME|NAVY|NC|NE|NEC|NET|NETBANK|NETWORK|NEUSTAR|NEW|NEWS|NEXUS|NF|NG|NGO|NHK|NI|NICO|NINJA|NISSAN|NL|NO|NOKIA|NORTON|NOWRUZ|NP|NR|NRA|NRW|NTT|NU|NYC|NZ|OBI|OFFICE|OKINAWA|OM|OMEGA|ONE|ONG|ONL|ONLINE|OOO|ORACLE|ORANGE|ORG|ORGANIC|ORIGINS|OSAKA|OTSUKA|OVH|PA|PAGE|PAMPEREDCHEF|PANERAI|PARIS|PARS|PARTNERS|PARTS|PARTY|PE|PET|PF|PG|PH|PHARMACY|PHILIPS|PHOTO|PHOTOGRAPHY|PHOTOS|PHYSIO|PIAGET|PICS|PICTET|PICTURES|PID|PIN|PING|PINK|PIZZA|PK|PL|PLACE|PLAY|PLAYSTATION|PLUMBING|PLUS|PM|PN|POHL|POKER|PORN|POST|PR|PRAXI|PRESS|PRO|PROD|PRODUCTIONS|PROF|PROMO|PROPERTIES|PROPERTY|PROTECTION|PS|PT|PUB|PW|PY|QA|QPON|QUEBEC|RACING|RE|READ|REALTOR|REALTY|RECIPES|RED|REDSTONE|REDUMBRELLA|REHAB|REISE|REISEN|REIT|REN|RENT|RENTALS|REPAIR|REPORT|REPUBLICAN|REST|RESTAURANT|REVIEW|REVIEWS|REXROTH|RICH|RICOH|RIO|RIP|RO|ROCHER|ROCKS|RODEO|ROOM|RS|RSVP|RU|RUHR|RUN|RW|RWE|RYUKYU|SA|SAARLAND|SAFE|SAFETY|SAKURA|SALE|SALON|SAMSUNG|SANDVIK|SANDVIKCOROMANT|SANOFI|SAP|SAPO|SARL|SAS|SAXO|SB|SBS|SC|SCA|SCB|SCHAEFFLER|SCHMIDT|SCHOLARSHIPS|SCHOOL|SCHULE|SCHWARZ|SCIENCE|SCOR|SCOT|SD|SE|SEAT|SECURITY|SEEK|SELECT|SENER|SERVICES|SEVEN|SEW|SEX|SEXY|SFR|SG|SH|SHARP|SHELL|SHIA|SHIKSHA|SHOES|SHOW|SHRIRAM|SI|SINGLES|SITE|SJ|SK|SKI|SKIN|SKY|SKYPE|SL|SM|SMILE|SN|SNCF|SO|SOCCER|SOCIAL|SOFTBANK|SOFTWARE|SOHU|SOLAR|SOLUTIONS|SONY|SOY|SPACE|SPIEGEL|SPREADBETTING|SR|SRL|ST|STADA|STAR|STARHUB|STATEFARM|STATOIL|STC|STCGROUP|STOCKHOLM|STORAGE|STUDIO|STUDY|STYLE|SU|SUCKS|SUPPLIES|SUPPLY|SUPPORT|SURF|SURGERY|SUZUKI|SV|SWATCH|SWISS|SX|SY|SYDNEY|SYMANTEC|SYSTEMS|SZ|TAB|TAIPEI|TAOBAO|TATAMOTORS|TATAR|TATTOO|TAX|TAXI|TC|TCI|TD|TEAM|TECH|TECHNOLOGY|TEL|TELEFONICA|TEMASEK|TENNIS|TF|TG|TH|THD|THEATER|THEATRE|TICKETS|TIENDA|TIFFANY|TIPS|TIRES|TIROL|TJ|TK|TL|TM|TMALL|TN|TO|TODAY|TOKYO|TOOLS|TOP|TORAY|TOSHIBA|TOURS|TOWN|TOYOTA|TOYS|TR|TRADE|TRADING|TRAINING|TRAVEL|TRAVELERS|TRAVELERSINSURANCE|TRUST|TRV|TT|TUBE|TUI|TUSHU|TV|TW|TZ|UA|UBS|UG|UK|UNIVERSITY|UNO|UOL|US|UY|UZ|VA|VACATIONS|VANA|VC|VE|VEGAS|VENTURES|VERISIGN|VERSICHERUNG|VET|VG|VI|VIAJES|VIDEO|VILLAS|VIN|VIP|VIRGIN|VISION|VISTA|VISTAPRINT|VIVA|VLAANDEREN|VN|VODKA|VOLKSWAGEN|VOTE|VOTING|VOTO|VOYAGE|VU|WALES|WALTER|WANG|WANGGOU|WATCH|WATCHES|WEATHER|WEBCAM|WEBER|WEBSITE|WED|WEDDING|WEIR|WF|WHOSWHO|WIEN|WIKI|WILLIAMHILL|WIN|WINDOWS|WINE|WME|WORK|WORKS|WORLD|WS|WTC|WTF|XBOX|XEROX|XIN|XN--11B4C3D|XN--1QQW23A|XN--30RR7Y|XN--3BST00M|XN--3DS443G|XN--3E0B707E|XN--3PXU8K|XN--42C2D9A|XN--45BRJ9C|XN--45Q11C|XN--4GBRIM|XN--55QW42G|XN--55QX5D|XN--6FRZ82G|XN--6QQ986B3XL|XN--80ADXHKS|XN--80AO21A|XN--80ASEHDB|XN--80ASWG|XN--90A3AC|XN--90AIS|XN--9DBQ2A|XN--9ET52U|XN--B4W605FERD|XN--C1AVG|XN--C2BR7G|XN--CG4BKI|XN--CLCHC0EA0B2G2A9GCD|XN--CZR694B|XN--CZRS0T|XN--CZRU2D|XN--D1ACJ3B|XN--D1ALF|XN--ECKVDTC9D|XN--EFVY88H|XN--ESTV75G|XN--FHBEI|XN--FIQ228C5HS|XN--FIQ64B|XN--FIQS8S|XN--FIQZ9S|XN--FJQ720A|XN--FLW351E|XN--FPCRJ9C3D|XN--FZC2C9E2C|XN--G2XX48C|XN--GECRJ9C|XN--H2BRJ9C|XN--HXT814E|XN--I1B6B1A6A2E|XN--IMR513N|XN--IO0A7I|XN--J1AEF|XN--J1AMH|XN--J6W193G|XN--JLQ61U9W7B|XN--KCRX77D1X4A|XN--KPRW13D|XN--KPRY57D|XN--KPU716F|XN--KPUT3I|XN--L1ACC|XN--LGBBAT1AD8J|XN--MGB9AWBF|XN--MGBA3A3EJT|XN--MGBA3A4F16A|XN--MGBAAM7A8H|XN--MGBAB2BD|XN--MGBAYH7GPA|XN--MGBB9FBPOB|XN--MGBBH1A71E|XN--MGBC0A9AZCG|XN--MGBERP4A5D4AR|XN--MGBPL2FH|XN--MGBT3DHD|XN--MGBTX2B|XN--MGBX4CD0AB|XN--MK1BU44C|XN--MXTQ1M|XN--NGBC5AZD|XN--NGBE9E0A|XN--NODE|XN--NQV7F|XN--NQV7FS00EMA|XN--NYQY26A|XN--O3CW4H|XN--OGBPF8FL|XN--P1ACF|XN--P1AI|XN--PBT977C|XN--PGBS0DH|XN--PSSY2U|XN--Q9JYB4C|XN--QCKA1PMC|XN--QXAM|XN--RHQV96G|XN--S9BRJ9C|XN--SES554G|XN--T60B56A|XN--TCKWE|XN--UNUP4Y|XN--VERMGENSBERATER-CTB|XN--VERMGENSBERATUNG-PWB|XN--VHQUV|XN--VUQ861B|XN--WGBH1C|XN--WGBL6A|XN--XHQ521B|XN--XKC2AL3HYE2A|XN--XKC2DL3A5EE0H|XN--Y9A3AQ|XN--YFRO4I67O|XN--YGBI2AMMX|XN--ZFR164B|XPERIA|XXX|XYZ|YACHTS|YAMAXUN|YANDEX|YE|YODOBASHI|YOGA|YOKOHAMA|YOUTUBE|YT|ZA|ZARA|ZERO|ZIP|ZM|ZONE|ZUERICH|ZW)\b

That easily discarded emails like 3c296rD3HNEE@d139c.a51, Sd@sd.dox, etc.

The domain name can be further edited if needed, e.g., specific country domain, etc.

Another list of top level domains that updates frequently.

Learner
  • 5,192
  • 1
  • 24
  • 36
  • Like pointed out in multiple comments to other answers here already, the list of valid TLDs is growing rapidly. Your "2-letter ccTLD or one of big-6, info, mobi, etc" would have been reasonable five years ago, but no longer works at all reliably. – tripleee Sep 15 '15 at 04:32
  • 1
    Even at time of original writing, this was already invalid by a couple hundred TLD's. As of currently, you're missing a little under 1200 possibilities (and growing at a pretty regular rate) Current list of valid domains: http://data.iana.org/TLD/tlds-alpha-by-domain.txt – user2366842 Jan 27 '16 at 19:38
  • 2
    Nearly 8,000 characters in a regular expression? There is something seriously wrong. Can't it be layered, refactored, split into several pieces, or similar instead of one big regular expression? There is presumably a lot of redundancy in it. – Peter Mortensen Feb 14 '22 at 23:33
2

List item

I use this function

function checkmail($value) {
    $value = trim($value);
    if (stristr($value,"@") &&
        stristr($value,".") &&
        (strrpos($value, ".") - stripos($value, "@") > 2) &&
        (stripos($value, "@") > 1) &&
        (strlen($value) - strrpos($value, ".") < 6) &&
        (strlen($value) - strrpos($value, ".") > 2) &&
        ($value == preg_replace('/[ ]/', '', $value)) &&
        ($value == preg_replace('/[^A-Za-z0-9\-_.@!*]/', '', $value))
    )
    {

    }
    else {
        return "Invalid Mail-Id";
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Prassd Nidode
  • 302
  • 1
  • 9
2

Writing a regular expression for all the things will take a lot of effort. Instead, you can use pyIsEmail package.

Below text is taken from pyIsEmail website.

pyIsEmail is a no-nonsense approach for checking whether that user-supplied email address could be real.

Regular expressions are cheap to write, but often require maintenance when new top-level domains come out or don’t conform to email addressing features that come back into vogue. pyIsEmail allows you to validate an email address – and even check the domain, if you wish – with one simple call, making your code more readable and faster to write. When you want to know why an email address doesn’t validate, they even provide you with a diagnosis.

Usage

For the simplest usage, import and use the is_email function:

from pyisemail import is_email

address = "test@example.com"
bool_result = is_email(address)
detailed_result = is_email(address, diagnose=True)

You can also check whether the domain used in the email is a valid domain and whether or not it has a valid MX record:

from pyisemail import is_email

address = "test@example.com"
bool_result_with_dns = is_email(address, check_dns=True)
detailed_result_with_dns = is_email(address, check_dns=True, diagnose=True)

These are primary indicators of whether an email address can even be issued at that domain. However, a valid response here is not a guarantee that the email exists, merely that is can exist.

In addition to the base is_email functionality, you can also use the validators by themselves. Check the validator source doc to see how this works.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • Re *"...when new top-level domains come out"*: Aren't there literally thousands by now? – Peter Mortensen Feb 15 '22 at 00:00
  • This sounds more like an advert. What does it actually do? What is the gist? Does it go live over the Internet to do some lookups or checks (that involves some DNS stuff)? *Effectively* trying to send the email to see what happens? Or something else? – Peter Mortensen Feb 15 '22 at 00:05
1

The world's most popular blogging platform WordPress uses this function to validate email address...

But they are doing it with multiple steps.

You don't have to worry anymore when using the regex mentioned in this function...

Here is the function...

/**
 * Verifies that an email is valid.
 *
 * Does not grok i18n domains. Not RFC compliant.
 *
 * @since 0.71
 *
 * @param string $email Email address to verify.
 * @param boolean $deprecated Deprecated.
 * @return string|bool Either false or the valid email address.
 */
function is_email( $email, $deprecated = false ) {
    if ( ! empty( $deprecated ) )
        _deprecated_argument( __FUNCTION__, '3.0' );

    // Test for the minimum length the email can be
    if ( strlen( $email ) < 3 ) {
        return apply_filters( 'is_email', false, $email, 'email_too_short' );
    }

    // Test for an @ character after the first position
    if ( strpos( $email, '@', 1 ) === false ) {
        return apply_filters( 'is_email', false, $email, 'email_no_at' );
    }

    // Split out the local and domain parts
    list( $local, $domain ) = explode( '@', $email, 2 );

    // LOCAL PART
    // Test for invalid characters
    if ( !preg_match( '/^[a-zA-Z0-9!#$%&\'*+\/=?^_`{|}~\.-]+$/', $local ) ) {
        return apply_filters( 'is_email', false, $email, 'local_invalid_chars' );
    }

    // DOMAIN PART
    // Test for sequences of periods
    if ( preg_match( '/\.{2,}/', $domain ) ) {
        return apply_filters( 'is_email', false, $email, 'domain_period_sequence' );
    }

    // Test for leading and trailing periods and whitespace
    if ( trim( $domain, " \t\n\r\0\x0B." ) !== $domain ) {
        return apply_filters( 'is_email', false, $email, 'domain_period_limits' );
    }

    // Split the domain into subs
    $subs = explode( '.', $domain );

    // Assume the domain will have at least two subs
    if ( 2 > count( $subs ) ) {
        return apply_filters( 'is_email', false, $email, 'domain_no_periods' );
    }

    // Loop through each sub
    foreach ( $subs as $sub ) {
        // Test for leading and trailing hyphens and whitespace
        if ( trim( $sub, " \t\n\r\0\x0B-" ) !== $sub ) {
            return apply_filters( 'is_email', false, $email, 'sub_hyphen_limits' );
        }

        // Test for invalid characters
        if ( !preg_match('/^[a-z0-9-]+$/i', $sub ) ) {
            return apply_filters( 'is_email', false, $email, 'sub_invalid_chars' );
        }
    }

    // Congratulations your email made it!
    return apply_filters( 'is_email', $email, $email, null );
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
PrivateUser
  • 4,474
  • 12
  • 61
  • 94
  • This is too restrictive, it does not permit a quoted user component, and it does not support internationalized email addresses. Also the `_deprecated_argument` function is not defined. – awwright Sep 12 '20 at 23:47
1

As mentioned already, you can't validate an email with a regex. However, here's what we currently use to make sure user-input isn't totally bogus (forgetting the TLD, etc.).

This regex will allow IDN domains and special characters (like Umlauts) before and after the @ sign.

/^[\w.+-_]+@[^.][\w.-]*\.[\w-]{2,63}$/iu
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
zıəs uɐɟəʇs
  • 1,728
  • 3
  • 14
  • 27
1

I found a nice article, which says that the best way to validate e-mail address is the regular expression /.+@.+\..+/i.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
syp_dino
  • 395
  • 3
  • 10
1

I converted the code into Java to match the compiler:

String pattern = "(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Hany Sakr
  • 2,591
  • 28
  • 27
1

I'd like to propose my approach which is relatively simple while ensuring proper email structure and restricting forbidden characters. Valid for latin characters.

/^(?![\w\.@]*\.\.)(?![\w\.@]*\.@)(?![\w\.]*@\.)\w+[\w\.]*@[\w\.]+\.\w{2,}$/
Eggon
  • 2,032
  • 2
  • 19
  • 38
0
^[_a-zA-Z0-9-]+(\.[_a-zA-Z0-9-]+)*@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*\.(([0-9]{1,3})|([a-zA-Z]{2,3})|(aero|coop|info|museum|name))$

This matches 99.99% of email addresses, including some of the newer top-level-domain extensions, such as info, museum, name, etc. It also allows for emails tied directly to IP addresses.

Jamal
  • 763
  • 7
  • 22
  • 32
Francisco Costa
  • 6,713
  • 5
  • 34
  • 43
0

You can use following regular expression for any email address:

^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$

For PHP

function checkEmailValidation($email)
{
    $expression = '/^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/';
    if(preg_match($expression, $email))
    {
        return true;
    }
    else
    {
        return false;
    }
}

For JavaScript

function checkEmailValidation(email)
{
    var pattern = '/^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/';
    if(pattern.test(email))
    {
        return true;
    }
    else
    {
        return false;
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ramesh Kotkar
  • 747
  • 9
  • 11
  • 1
    `if(preg_match($expression, $email)) { return true; } else { return false; }` can be simplified to `return (bool) preg_match($expression, $email);` – Luna Aug 14 '15 at 12:33
0

Yet another option we have is to use DataAnnotations which has an EmailAddressAttribute. This can not only be applied to the property of a class but can also leveraged at runtime.

using System.ComponentModel.DataAnnotations;

Typical Usage

public class Person
{
    public int Id { get; set; }

    [EmailAddress]
    public string Email { get; set; }
}

At Runtime

var emailAddressAttribute = new EmailAddressAttribute();

if (emailAddressAttribute.IsValid("name@email.com"))
{
    //email is valid
}
else
{
    //email is invalid
}
Craig
  • 6,869
  • 3
  • 32
  • 52
0

For my purpose, I needed a way to also extract a display name if provided.
Thanks to the other answers and the regex provided on https://emailregex.com/ I came up with the following solution:

/^(?:([^<]*?)\s*<)?((?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]))>?$/gi

This matches Display name (=group 1) + email address (=group 2).

Examples of matches:

john.doe@example.com
john.o'doe@example.com
John <john@doe.com>
<john@doe.com>
This is <john@127.0.0.1>

Tested with https://regex101.com/

Of course, as also mentioned in other answers, additional validation of the length of display name and email address is required (shouldn't exceed 320 UTF-8 bytes).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Tim Wißmann
  • 647
  • 6
  • 18
0

The question title is fairly generic, however the body of the question indicates that it is about the PHP based solution. Will try to address both.

Generically speaking, for all programming languages: Typically, validating" an e-mail address with a reg-ex is something that any internet based service provider should desist from. The possibilities of kinds of domain names and e-mail addresses have increased so much in terms of variety, any attempt at validation, which is not well thought may end up denying some valid users into your system. To avoid this, one of the best ways is to send an email to the user and verify it being received. The good folks at "Universal Acceptance Steering Group" have compiled a languagewise list of libraries which are found to be compliant/non-compliant with various parameters involving validations vis-a-vis Internationalized Domain Names and Internationalized Email addresses. Please find the links to those documents over here and here.

Speaking specifically of PHP:

There is one good library available in PHP i.e. EmailValidator. It is an email address validator that includes many validation methods such as DNS validation. The validator specifically recommended is called RFCValidator and validates email addresses against several RFCs. It has good compliance when it comes to being inclusive towards IDNs and Internationalized Email addresses.

ThinkTrans
  • 21
  • 3
-2

The regular expression that I use:

[\w-+]+([.][\w]+)?@[\w-+]+([.][a-z]{2,})+
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
GooDeeJAY
  • 1,681
  • 2
  • 20
  • 27
-3

This simple pattern works for me:

^(?<name>[^<>#()\.,;\s@\"]{1,})@(?<domain>[^<>#()\.,;\s@\"]{2,}\.(?<top>[^<>#()\.,;:\s@\"]{2,}))$
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Andriy B.
  • 421
  • 3
  • 14
  • 2
    Welcome to Stack Overflow. If you decide to answer an older question that has well established and correct answers, adding a new answer late in the day may not get you any credit. If you have some distinctive new information, or you're convinced the other answers are all wrong, by all means add a new answer, but 'yet another answer' giving the same basic information a long time after the question was asked usually won't earn you much credit. It also isn't entirely clear which dialect of regex you are using. – Jonathan Leffler Oct 01 '20 at 22:16