Connexer Ltd. Masthead

Name and Email Validation: You Are Doing It Wrong

Revision History

Revision Date Revised by Comments
0.0 20170621 RCS Initial draft

Copyright

This article is copyright © 2017 Connexer Ltd.


Licensing

Creative Commons License
This article is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

In case you don't want to read this entire article, here is the conclusion: You are not properly validating names and emails. Even if you think you are validating them correctly, you probably are not. Do yourself and the world a favor by throwing out what you think you know about validating names and emails (and also time and addresses). After that, read http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ and http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/, (and perhaps the related articles out there on time and addresses; Google search is your friend here). Then when you go to implement some validation logic surrounding these items use a well known third party library that implements the proper validation logic or if you decide to implement the validation for yourself make sure you don't make improper assumptions.

Now, on to the long version:

I generally don't gripe in public about things that bother me. However, I am making an exception in this case because reasons. I continue to be amazed by the flawed assumptions that prevail amongst programmers when it comes to validating things like emails and names (and also time and addresses).

While I get annoyed when a website refuses my name (my last name contains an accented character, รก) I generally consider if the site in question is run by a large company or a small company. I consider that big companies should hold themselves to a much higher standard since they can afford it and often spend lots more on development. Something similar happens with my email address. I use a + and some sort of descriptor for just about every site that requires an email so that I can have a unique email address for each site/application and have all of the emails go to the same account (because that's how my mail server interprets the +). Any decent mail delivery agent will send emails addressed to foo@example.com and foo+bar@example.com to the same mailbox. The utility of this approach is two-fold: 1) it allows me to know where the email address was obtained (e.g., I used foo+bar@example.com to register on Company A's site but now I get emails at that same address from Company B even though I have never dealt with them, so A clearly sold or gave my information to B); and 2) it allows me to easily filter emails into different folders without having to do subject filtering or trying to find some other filter criteria which may change, like sender address.

So, when a small company's website fails to properly allow these very basic things, it is annoying. When a large multi-national company which retails all over North America or all over the world fails to allow these basic things, it is inconceivable (yes, I recently watched The Princess Bride again, and yes that word does mean what I think it means) that a major corporation would not consider its international audience by telling a non-trivial portion of the population that their name is invalid or by telling someone using a perfectly valid and standards-compliant email address that is is invalid. I have had issues with both my name and email address at the sites of two major international brands (you would recognize them as top brands in their industries) in just the last few months. And don't even get me started on addresses (my street number and address add up to over 40 characters and most systems limit addresses to 20 or 30 characters, arghh!).

The sad thing is that these problems have been common knowledge for over 10 years (note that the links I provide are from articles/blog postings from 2007 and 2010) and while every programmer and web developer should know and understand these issues, most don't.

There are many wrong assumptions about names. The best thing you can do in this regard is to read http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/. The one that seems to be a recurring problem for me is the one about "names are written in ASCII." Accented or otherwise decorated characters are common in much of the world (at least where Latin-based alphabets are used), and people use other non-Latin character sets in most of the rest of the world. How a site with customers in even as small an area as just North America (Canada/USA/Mexico) can get by without allowing accented characters in people's names is inconceivable (there's that word again). Go back and read the false assumptions programmers tend to make and ensure that any code you write does not rely on any of those assumptions for validating names.

Email is a little bit trickier. In particular, we tend to think that since email addresses are meant to be consumed by machines that we understand the rules much more clearly. We don't. Go read http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/ and have a look at some of the email addresses that you almost certainly thought were invalid that in fact meet the specifications of the RFC and are valid. For example, Gmail allows you to append "+anythingyouwant" to the local part of your email address and will deliver it to your inbox, allowing you to filter on the "+anythingyouwant". The + character is allowed by the standard. Any decent mail server software you run for yourself should allow you to make the same determination. The point is that if you implement some email validation that limits characters in the local part to things like numbers and letters, then you are rejecting valid email addresses. That said, if you are an email provider, you are free to limit the addresses people have with your service in whatever way you like. The issue here is when I go to a site and I am asked to enter an email. If I enter an email that is valid according to the standard and supported by my domain the site should not reject that. The standard is clear that the local part is only supposed to be interpreted by the host receiving mail for the domain. Again it is inconceivable (there's that word again) that websites routinely disallow users from entering valid email addresses. If you have to validate email addresses in your application, please do yourself and the world a favor and use a third party library that properly implements the validation rules.

That's it. If you were expecting the conclusion here, you will have to go back to the beginning and read it there because I don't expect that many people will trudge through everything I wrote.

If you managed to read this whole thing, please let me know what you thought, whether it was useful, or if you have any suggestions for improvement.

Articles


- Name and Email Validation: You Are Doing It Wrong - How to Ship Your Own Self-Signed Certificate Authority - Improve the Security of Your OpenPGP Key by Using Subkeys - Redeliver the Contents of an mbox - Customizing Debian Packages - Setting up a Debian Package Repository
More useful content ...

Top of Page