Unicode: A Beginner's Primer
Believe it or not, there's an image format which is built right into your browser. It allows images to be downloaded even before you need them, renders them perfectly on Retina screens, and allows them to have CSS colours and effects applied to them. Ok, I'm not being entirely truthful there. It's not an image format as such – but the rest still applies. Using Unicode you can create icons that are resolution independent, have virtually no download time and can also be styled with CSS.
In this article, I'll run you through the basics, as well as some of the interesting things that you can do with Unicode.
So, What is Unicode?
Unicode is a way of allowing letters and punctuation marks from different languages to be correctly displayed in a single document. This is incredibly useful; it means that your site can be used around the globe and will show exactly what you wanted to share – whether that happens to include French accented characters or is entirely written in Kanji.
Unicode is also being continually added to; currently it's on version 6.3, which has just under 110,000 characters. Version 7 will be released later this year and will add nearly 3,000 new characters.
Alongside letters and numbers, Unicode also specifies some symbols and icons. More recently these have expanded to include the Emoji icons that you may have seen on iOS messaging:
HTML pages are made up of sequences of Unicode characters, and when they're sent over a network, they're converted into bytes. Every letter or character for every language is given a unique code, and this can be encoded when the document is saved or shared.
Ideally, this encoding uses a system known as UTF-8, as this can encode any Unicode character, but even if that weren't the case, any character can be defined by a numeric character reference. For example, using
♥ will produce a heart, and you can just type that code straight into your html ♥.
That number can either be a standard number, or its hexadecimal equivalent. If it's hexadecimal, then the number needs an
x in front of it, so
♥ will give the same heart (2665 is hex for 9829).
If you're adding the Unicode character with CSS, then you'll need to use the hexadecimal value.
Some of the most frequently used Unicode symbols have a more memorable name or abbreviation that can be used instead of those number codes - you've probably used
& (ampersand) or
< (less than) for example.
Why Would You Want to Use Unicode?
Good question, but there are several reasons that I can think of:
- To add the correct marks from a variety of languages
- To use as icons directly
- To use as the underlying character for a
- You could even use Unicode characters for your CSS class names.
The first of these reasons shouldn't require any additional work. If your HTML file is saved as UTF-8, and is encoded when it's sent over a network as UTF-8, then everything should look great.
Should. Unfortunately, not all browsers or devices support all Unicode characters equally (you didn't expect something on the web to be that simple did you?) Characters like the Emoji symbols aren't supported on all devices, but those 'named' characters are much more reliable.
To make sure you're using UTF-8 in an HTML5 page, add
<meta charset=utf-8> to the
<head> of your web pages. If you're not using HTML5 then you'll need
<meta http-equiv="content-type" content="text/html; charset=UTF-8" /> instead.
Icons, Out of the Box
The second reason is because there are many very useful Unicode characters which can be used as icons on a web page. For example: ▶, ≡ and ♥.
What's great is that, where supported, there are no extra files to download to show these icons, which means your site is that bit faster. You can also add colour, or a drop shadow to them with CSS. Getting more creative, you could then add a transition to smoothly change the colour when someone hovers over the icon – and you can't do that with images.
Let's say, for example, that I wanted to include a little star rating indicator on my web page. I could do something like this:
<span>★ ★ ★ ☆ ☆</span>
This would give us something like the image below:
What you might occasionally see though, is something like this:
This is what happens when these characters don't work on the device or browser being used. (Fortunately, these star shapes are very well supported, and I've only ever come across older BlackBerry phones that have trouble with them).
The character that you see if the required Unicode character isn't supported will vary; you might see an empty rectangle, or a diamond with a question mark instead.
So how can you find the Unicode character that you'd like to use? Well, you could scroll through a site like Unicodinator to see what's available, but I love using Shapecatcher – this incredible site allows you to draw the icon, and it'll suggest the closest Unicode characters it can find for you to pick from.
Using Unicode With @font-face Icons
If you're using a
@font-face icon, then you might want to consider using a similar Unicode character as the fallback. This way, on a browser or device that doesn't support
@font-face (like Opera Mini or Windows Phone 7) the user would at least see a similar character:
@font-face tools default to using a range of Unicode characters which deliberately have no meaning or pre-determined shape (often referred to as the private use area or PUA characters). The downfall of this approach is that where
@font-face isn't supported, the user is left with a shape that has no meaning at all.
Using the PUA characters can also cause Internet Explorer 8 to go into Compatibility mode, and dark things lie down that path – see Jeremy Keith's article for more on the subject.
IcoMoon is great for creating
@font-face icon sets, and it lets you chose any Unicode character as the basis for an icon.
Just be careful though – some browsers and devices don't like certain Unicode characters being used for
@font-face, and won't render the icon. It might be worth running the suggested Unicode character through Unify – this will give you a indication of how safe it is to use that character in a
@font-face icon set.
A Word on Accessibility
One problem with using Unicode characters as a font-face fallback, is that they're often poorly supported for screen readers (again, Unify has some data on this) so you'll need to think carefully about how the icon is being used.
If your icon is purely decoration next to a text label that would be read by a screen reader, then I wouldn't worry too much. However, if your icon is standalone, then you may want to add a hidden text label to help screen reader users out. Even if the Unicode character is read out by the screen reader, the chances are it won't be anything like what you're using it for. For example, if you're using
≡ for the three horizontal line 'burger' navigation icon, VoiceOver on iOS will read it as “Identical to”.
Very few fonts will have characters for the full Unicode range, so if you're choosing a font, make sure to try a few characters that you're likely to need.
Segoe UI Symbol or
Arial Unicode MS for isolated icons. These fonts are reasonably likely to be on a PC, and on a Mac,
Lucida Grande has a large number of Unicode characters. If you want to use these, then just add them to the relevant
font-family CSS entry so the user will see the Unicode character in these fonts if they're installed.
Detecting Unicode Support
It would be handy if there was some way of detecting whether or not a Unicode character was supported before you used it, but there's no guaranteed way of doing so.
In short, test it. And make sure, if the character isn't supported, that the user can still understand what's going on.
Unicode in Emails
It's not just web pages that you can use Unicode on either – emails can be enhanced with them too.
This is the same story though; some email clients and devices support them, some don't. Campaign Monitor has done some testing which could you help you decide whether you should use them.
When they are supported, they can be very effective. For example, if an Emoji character is used in a subject line, that coloured icon could stand out nicely in an inbox.
That just about wraps up this introduction to Unicode. I hope it's been useful and helped you gain a clearer understanding of how Unicode works and how to use it.
If you have any questions, please just ask in the comments section.