Advertisement

Unicode: A Beginner’s Primer

by

This Cyber Monday Tuts+ courses will be reduced to just $3 (usually $15). Don't miss out.

Believe it or not, there's an image format which is built right into your browser. It allows images to be downloaded even before you need them, renders them perfectly on Retina screens, and allows them to have CSS colours and effects applied to them. Ok, I'm not being entirely truthful there. It's not an image format as such – but the rest still applies. Using Unicode you can create icons that are resolution independent, have virtually no download time and can also be styled with CSS.

In this article, I'll run you through the basics, as well as some of the interesting things that you can do with Unicode.


So, What is Unicode?

Unicode is a way of allowing letters and punctuation marks from different languages to be correctly displayed in a single document. This is incredibly useful; it means that your site can be used around the globe and will show exactly what you wanted to share – whether that happens to include French accented characters or is entirely written in Kanji.

Unicode is also being continually added to; currently it's on version 6.3, which has just under 110,000 characters. Version 7 will be released later this year and will add nearly 3,000 new characters.

Alongside letters and numbers, Unicode also specifies some symbols and icons. More recently these have expanded to include the Emoji icons that you may have seen on iOS messaging:

Emoji in iOS messaging

HTML pages are made up of sequences of Unicode characters, and when they're sent over a network, they're converted into bytes. Every letter or character for every language is given a unique code, and this can be encoded when the document is saved or shared.

Ideally, this encoding uses a system known as UTF-8, as this can encode any Unicode character, but even if that weren't the case, any character can be defined by a numeric character reference. For example, using ♥ will produce a heart, and you can just type that code straight into your html ♥.

That number can either be a standard number, or its hexadecimal equivalent. If it's hexadecimal, then the number needs an x in front of it, so ♥ will give the same heart (2665 is hex for 9829).
If you're adding the Unicode character with CSS, then you'll need to use the hexadecimal value.

Some of the most frequently used Unicode symbols have a more memorable name or abbreviation that can be used instead of those number codes - you've probably used & (ampersand) or < (less than) for example.


Why Would You Want to Use Unicode?

Good question, but there are several reasons that I can think of:

  1. To add the correct marks from a variety of languages
  2. To use as icons directly
  3. To use as the underlying character for a @font-face icon
  4. You could even use Unicode characters for your CSS class names.

Correct Marks

The first of these reasons shouldn't require any additional work. If your HTML file is saved as UTF-8, and is encoded when it's sent over a network as UTF-8, then everything should look great.

Should. Unfortunately, not all browsers or devices support all Unicode characters equally (you didn't expect something on the web to be that simple did you?) Characters like the Emoji symbols aren't supported on all devices, but those 'named' characters are much more reliable.

To make sure you're using UTF-8 in an HTML5 page, add <meta charset=utf-8> to the <head> of your web pages. If you're not using HTML5 then you'll need <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> instead.

Icons, Out of the Box

The second reason is because there are many very useful Unicode characters which can be used as icons on a web page. For example: ▶, ≡ and ♥.

What's great is that, where supported, there are no extra files to download to show these icons, which means your site is that bit faster. You can also add colour, or a drop shadow to them with CSS. Getting more creative, you could then add a transition to smoothly change the colour when someone hovers over the icon – and you can't do that with images.

Let's say, for example, that I wanted to include a little star rating indicator on my web page. I could do something like this:

<span>&#x2605; &#x2605; &#x2605; &#x2606; &#x2606;</span>

This would give us something like the image below:

Unicode rating example in Firefox

An example rating indicator viewed in Firefox

What you might occasionally see though, is something like this:

Unicode rating example on a BlackBerry 9000

The rating example viewed on a BlackBerry 9000

This is what happens when these characters don't work on the device or browser being used. (Fortunately, these star shapes are very well supported, and I've only ever come across older BlackBerry phones that have trouble with them).

The character that you see if the required Unicode character isn't supported will vary; you might see an empty rectangle, or a diamond with a question mark instead.

So how can you find the Unicode character that you'd like to use? Well, you could scroll through a site like Unicodinator to see what's available, but I love using Shapecatcher – this incredible site allows you to draw the icon, and it'll suggest the closest Unicode characters it can find for you to pick from.

shape

Using Unicode With @font-face Icons

If you're using a @font-face icon, then you might want to consider using a similar Unicode character as the fallback. This way, on a browser or device that doesn't support @font-face (like Opera Mini or Windows Phone 7) the user would at least see a similar character:

Comparison showing codefont-facecode icon and underlying Unicode character

Font Awesome icons in Chrome on the left, and on the right the underlying Unicode characters are shown in Opera Mini

Many @font-face tools default to using a range of Unicode characters which deliberately have no meaning or pre-determined shape (often referred to as the private use area or PUA characters). The downfall of this approach is that where @font-face isn't supported, the user is left with a shape that has no meaning at all.

Using the PUA characters can also cause Internet Explorer 8 to go into Compatibility mode, and dark things lie down that path – see Jeremy Keith's article for more on the subject.

IcoMoon is great for creating @font-face icon sets, and it lets you chose any Unicode character as the basis for an icon.

icomoon-glyphs
Fonts selected in IcoMoon showing Unicode base

Just be careful though – some browsers and devices don't like certain Unicode characters being used for @font-face, and won't render the icon. It might be worth running the suggested Unicode character through Unify – this will give you a indication of how safe it is to use that character in a @font-face icon set.

A Word on Accessibility

One problem with using Unicode characters as a font-face fallback, is that they're often poorly supported for screen readers (again, Unify has some data on this) so you'll need to think carefully about how the icon is being used.

If your icon is purely decoration next to a text label that would be read by a screen reader, then I wouldn't worry too much. However, if your icon is standalone, then you may want to add a hidden text label to help screen reader users out. Even if the Unicode character is read out by the screen reader, the chances are it won't be anything like what you're using it for. For example, if you're using &#x2261; for the three horizontal line 'burger' navigation icon, VoiceOver on iOS will read it as “Identical to”.

Choosing Fonts

Very few fonts will have characters for the full Unicode range, so if you're choosing a font, make sure to try a few characters that you're likely to need.

Try Segoe UI Symbol or Arial Unicode MS for isolated icons. These fonts are reasonably likely to be on a PC, and on a Mac, Lucida Grande has a large number of Unicode characters. If you want to use these, then just add them to the relevant font-family CSS entry so the user will see the Unicode character in these fonts if they're installed.


Detecting Unicode Support

It would be handy if there was some way of detecting whether or not a Unicode character was supported before you used it, but there's no guaranteed way of doing so.

Modernizr has a bit of JavaScript to try and test for Emoji support – but this works by checking a single pixel to see if anything is there. So if the character you want to test doesn't cover that space, even when it is displayed, then the test will give you the wrong result. And just because one Unicode character is displayed correctly, it doesn't mean the other 109,999 will be.

In short, test it. And make sure, if the character isn't supported, that the user can still understand what's going on.


Unicode in Emails

It's not just web pages that you can use Unicode on either – emails can be enhanced with them too.

This is the same story though; some email clients and devices support them, some don't. Campaign Monitor has done some testing which could you help you decide whether you should use them.

When they are supported, they can be very effective. For example, if an Emoji character is used in a subject line, that coloured icon could stand out nicely in an inbox.


Conclusion

That just about wraps up this introduction to Unicode. I hope it's been useful and helped you gain a clearer understanding of how Unicode works and how to use it.

If you have any questions, please just ask in the comments section.


Further Reading and Resources

Advertisement