Overview of Oracle Character Sets

I've been working with databases for a long time, and the topic of character sets took me a while to understand.

This article will answer some of the common questions, and questions I had about Oracle character sets and how they relate to the Oracle database.

What is a Character Set?

As defined by Webopedia:

A character set is a defined list of characters that is recognised by computers.

Each human language (such as English, French, Latin, or Japanese) has a range of characters that make up that language. A character set is a set of these characters.

An example of a character set is ASCII. It contains a range of numbers, symbols, letters, and some letters with accents on them.

Another example is Unicode, which contains a lot more characters than ASCII.

What's the Difference Between a Character Set and a Character Encoding?

A character set is a range of characters. An encoding is how those characters are stored.

An encoding method matches a code against a character for a character set.

For example in the ASCII encoding method, the code of 23 is used for a # symbol.

Other encoding methods will have different codes to represent the same character, and the code of 23 will represent a different character.

Find out more about different SQL terms in my SQL glossary.

How Can I Find the Character Set Being Used by my Oracle Database?

Character sets are defined for the database in Oracle. Unlike MySQL, they cannot be defined at the table or the column level.

The character set being used by your database can be found in the NLS_DATABASE_PARAMETERS table.

1SELECT value
2FROM nls_database_parameters
3WHERE parameter = 'NLS_CHARACTERSET';
VALUE
AL32UTF8

While you can change some of your session parameters (such as the language or the timezone), you can't change the character set.

What Oracle Character Sets are Most Useful?

I've seen that UTF8 is quite popular (specifically called AL32UTF8). This is a character encoding for the Unicode character set, which uses one byte per character.

UTF16 is similar but uses two bytes per character.

The actual character set you choose will depend on your requirements for your data. When choosing a character set, note that it cannot be changed unless you drop and recreate the database, or perform a migration (which is not easy).

So, there's my brief explanation of character sets and how you can see them in Oracle. I haven't had a lot to do with them but sometimes it's just good to know.

comments powered by Disqus