Character Data Types

Regular Character Data Types – Both the char and varchar data types are considered regular or non-Unicode character data types where each alphanumeric character uses only one byte of data. This is in contrast to the Unicode character data types (nchar and nvarchar) that uses two bytes of data for each alphanumeric character. We will talk more about Unicode data types later in this demonstration.

The difference between these two regular data types is that the char data type is fixed length while the varchar data type is variable length. In our example, the char(10) data type will hold up to 10 characters and will require us to fill all ten characters. If all ten characters are not used then spaces will be added to fill in the extra characters. In the case of the varchar(10) data type, we will only use the space needed and will not use what we don’t need.

An analogy of this would be if we reserved a table at a restaurant with 10 chairs. If only 5 people showed up, then the restaurant would have the option to either charge us for all 10 chairs, including those we did not use, or to take away 5 chairs and only charge us for the 5 seats we did use. In the case where we were charged for the extra 5 chairs, that would be very similar to the fixed length char(10) data type. In the case where we were not charged for the 5 extra chairs, that would be the same as the variable length varchar(10) data type. However, instead of chairs, we are concerned about bytes.

To demonstrate our regular character data type we will start with some sample code.

On lines 1 and 2, we are Declaring two variables. The first variable named @String1 will be a char(10) data type. The second variable will be named @String2 with a data type of varchar(10).

In SQL Server 2008 and later you can initialize or set a value to a variable on the same line that you declare the variable. In this case, we will assign the text value ‘Hello’ to the @String1 variable and then the text value of ‘World’ to the @String2 variable.

On line 4 of the sample code we are using concatenation to paste the two string variables together. As you will see below, since @String1 is char(10) data type, five spaces were added after the word ‘Hello’. However, in the case of @String2, which is a varchar(10), the extra 5 spaces were not added after the word ‘World’.

Since this may difficult to see, we will use two String functions to demonstrate this result. On line 5 of our sample code we used the LEN function to see how many characters are in each variable. As you see for both @String1 and @String2 it shows that each have five characters. (‘Hello’ for @String1 and ‘World’ for @String2)

However, On line 6 of our sample code we used the DATALENGTH function which instead of showing how many characters that are being used in each variable, the result set shows how many bytes are being used in each variable. As you see @String1 is using 10 bytes while @String2 is only using 5 bytes.

Unicode Character Data Types – Now let’s talk about the nchar (fixed length) and nvarchar (variable length) Unicode character data types. These are both similar to the Regular character data types except as previously mentioned they use 2 bytes per alphanumeric character.

You can think of Unicode as UNIversal code and the N in front of the data type names as Nationalized. The reason we would use this more expensive data type is for languages that have larger character sets. For example, the English language only uses about 76 values in its characters set. (Uppercase A-Z, lowercase a-z, 0-9, and then 16 special characters.) So for the English language only 1 byte is needed to represent its character set. This is why Regular or non-Unicode character data types are used.

However, many non-English languages have thousands of characters in their language so 2 bytes are needed to represent their character sets. So that is why the Unicode data types are needed. Now let’s use the same examples as above except we will add the ‘n’ in front of the data types.

We will also add a capital N in front of the text strings ‘Hello’ and ‘World’. This will explicitly convert the text data to Unicode data type. If the capital N is not included our text strings will implicitly convert to the correct data type as long as only English characters were used. Using the capital N is to ensure that if a non-English character was added in the text string it would be converted correctly.

So now that we have changed the data types of our variables to Unicode, Let us run our three SELECT statements to see what has changed. First the statement that concatenates the two variables. You should notice that there is no visible difference from Regular character data types and the Unicode character data types.

And if we look at the LEN function to see how many characters are being stored in the Unicode data types, again nothing changes in the results. Both variables still hold five character each.

Now when we run the DATALENGTH function we will see the difference in the number of bytes used. You will notice that the Unicode data types use twice as many bytes as the Regular data types.

Hopefully, this will help you when working with the different character data types. One final note, character data types are limited in the amount of characters that they can use, specifically they are limited to the size of a data page. So the limit for the Regular data type is varchar(8000) and for Unicode data type would be nvarchar(4000).

Prior to SQL Server 2008 you could use either the Text or Ntext data types if you need more characters than the limit. However, the text and ntext data types are being deprecated and either varchar(MAX) or nvarchar(MAX) should be used in those situations. Also, as a precaution, always validate your data to protect yourself from SQL Injection attacks. Especially when using the MAX data types.

Be the first to comment on "Character Data Types"

Leave a comment

Your email address will not be published.


*