MSSQL, SQLSRV, PHP and UTF-8

Published on 15.08.2011, by Lubos Dzurik

MSSQL, SQLSRV, PHP and UTF-8

Recently I was trying to setup Microsoft SQL Server 2005 for PHP application with UTF-8 support. Looking like a trivial task I hoped to solve the issue within 1-2 hours. In fact, I spent 2 nights figuring out the solution... So here is everything about setting up MS SQL server 2005 and 2008 for PHP with UTF-8 support.

Chaos around PHP drivers for MS SQL

Part of the problems is caused by the availability of various drivers for the same server under different names distributed via different channels. Currently, following drivers are available for MS SQL server:

Driver for Microsoft SQL server Supplied with
PHP version
Last update Supports UTF-8 Download link
MSSQL < 5.3.0   no http://php.net
PDO_MSSQL < 5.3.0   no http://php.net
DBLIB < 5.3.0 2005 no http://pecl.php.net
DBLIB patched by MOODLE (also called FreeTDS) none 2009 no http://docs.moodle.org
PDO_DBLIB < 5.3.0   no http://php.net
SQLSRV none 06/2011 yes http://www.microsoft.com, http://pecl.php.net
PDO_SQLSRV none 06/2011 yes http://www.microsoft.com, http://pecl.php.net

What's the right PHP driver for Microsoft SQL Server?

You should only use the driver maintained by Microsoft - SQLSRV or PDO_SQLSRV. This is the only driver capable of storing UTF-8 strings into two-byte columns (NVARCHAR instead of VARCHAR) which is currently maintained and supported by Microsoft for PHP 5.2.X and 5.3.X.

Unfortunatelly, this driver is not supplied with PHP package downloaded from PHP.NET site. You only can get it from Microsoft's site (or PECL site), which makes it harder for community to find and actually causes confuson around the driver. Historically, all PHP related stuff has always been available from PHP sites.

On the other hand it might be understandable, that maintaining all driver versions compiled for 5.2 and 5.3, non-PDO and PDO, VC6 and VC9, apache and IIS just takes quite a lot effort which only a big vendor can dedicate over time efficiently.

Nevertheless, I believe, it would be at least worth to include file like "mssql.readme" within PHP distributions and thus make things clearer to the community.

Loaded SQLSRV driver for MS SQL Server

Fig. 1: Loaded SQLSRV driver for MS SQL Server

SQLSRV DSN connect string

SQLSRV introduces some new features for DSN connect string. Following is valid DSN connect string example:

sqlsrv:server=MYHOMEPC\\SQLEXPRESS;database=drupal;MultipleActiveResultSets=false

The MultipleActiveResultSets is server specific setting, that turns on/off processing SQL queries in a batch. I was not able to run standard transactions with MultipleActiveResultSets turned on, however it is possible that I missed some configuration. With MultipleActiveResultSets=false I was able to run all scripts written originally for old standard driver PDO_MSSQL and MSSQL, this time with correctly stored UTF-8 strings.

There are more newly introduced and not that well known DSN connect parameters, like Encrypt, MultipleActiveResultSets (MARS), TransactionIsolation, TrustServerCertificate. You may want to inspect also MS SQL Server specific PDO attributes, like PDO::SQLSRV_ENCODING_UTF8 or PDO::SQLSRV_ATTR_DIRECT_QUERY.

Storing UTF-8 strings in MS SQL Server

Do the following to store UTF-8 strings via PHP in MS SQL Server database:

  1. Download Microsoft Drivers for PHP for SQL Server »
  2. Unpack files into temporary directory. You will have bunch of SQLSRV driver for various PHP versions, including complete API documentation and manual »
  3. Identify correct driver file. You should know whether you need thread safe (ts) or not (nts), compiled under VC6 or VC9, PHP target version and PDO or not PDO, e.g. "php_pdo_sqlsrv_52_ts_vc6.dll".
  4. Copy your driver file into extensions directory and add into php.ini line e.g. extension=php_pdo_sqlsrv_52_ts_vc6.dll
  5. Restart web server and check phpinfo() - SQLSRV or PDO_SQLSRV driver(s) should be loaded.
  6. Define database scheme with multibyte column types (NVARCHAR instead of VARCHAR, see MSSQL data types)

That's it. By default SQLDRV drivers handles all strings like UTF-8 and does internal conversions from and into native UCS-2 (Unicode) encoding.

Case sensitivity and accent sensitivity concerns

Stored UTF-8 strings are truly converted into native UCS-2 encoding by SQLSRV driver when writing into SQL server. This ensures correct handling case sensitivity and accent sensitive searches. Unlike other drivers you don't need to do any encoding workarounds.

Case sensitivity and accent sensitivity are set when you create new table. Setting correct case and accent sensitivity may significantly influence search results.

Case sensitivity (CS = Case Sensitive, CI = Case Insensitive)
If you want to disable duplicate records for "Computer" and "computer", set collation to CI. As for Slovak, you must choose Slovak_CI collation.
Accent sensitivity (AS = Accent Sensitive, AI = Accent Insensitive)
If you want to disable duplicate records for "Peter Novák" and "Peter Novak", set collation to AI. As for Slovak, you must choose Slovak_AI collation.
Combinations AI, CI
You can combine the case / accent sensitivity as you wish. Very common collation is AI, CI to prevent from duplicate records (e.g. Slovak_CI_AI)

Correctly stored accented UTF8 characters in MS SQL database

Fig. 2: Correctly stored accented UTF8 characters in MS SQL database

Conclusion

Currently, the only maintained driver for Microsoft SQL Server (2005, 2008, Azure) is named SQLSRV and comes in two basic flavours SQLSRV and PDO_SQLSRV. These have some server specific features and have built-in support for storing UTF-8 strings into MSSQL big endian native encoding. If you are writing cross-database portable applications, you should only use PDO_SQLSRV. You can download drivers from the Microsoft site and read the documentation at the PHP.NET site.

Possible issues ...

SQLSTATE[IMSSP]: This extension requires either the Microsoft SQL Server 2008 Native Client (SP1 or later) or the Microsoft SQL Server 2008 R2 Native Client ODBC Driver to communicate with SQL Server. Neither of those ODBC Drivers are currently installed. Access the following URL to download the Microsoft SQL Server 2008 R2 Native Client ODBC driver for x86: http://go.microsoft.com/fwlink/?LinkId=163712
On Windows, the SQLSRV driver requires to have Native Client installed. You can download it from Microsoft site »

Comments...

vinigarcia87 AT gmail DOT com

31.05.2012 02:42
# 1 Reply to vinigarcia87 AT gmail DOT com    
 

Thank for the very explanatory post.
I did it on my win 7 desktop and it worked perfectlly. = )

BUT, I have a Debian 6 server with PHP 5.4 and I need to connect the MS SQL Server 2008.

The driver PDO_SQLSRV didn't work on Linux so I used dblib with freeTDS and mssql...
But I got a problem with the charsets.
My UTF-8 application didn't communicate properly with MS SQL Server...

Can you please help me?
I stock on it for a month! =S

By the way, my application is a Zend Framework 1.11 website.

Thanks for everything and sorry for the bad english! ^^

lubosdz

31.05.2012 21:54
# 2 Reply to lubosdz    
 

vinigarcia87@gmail.com wrote on 31.05.2012 02:42:
Thank for the very explanatory post.
I did it on my win 7 desktop and it worked perfectlly. = )

BUT, I have a Debian 6 server with PHP 5.4 and I need to connect the MS SQL Server 2008.

The driver PDO_SQLSRV didn't work on Linux so I used dblib with freeTDS and mssql...
But I got a problem with the charsets.
My UTF-8 application didn't communicate properly with MS SQL Server...

Can you please help me?
I stock on it for a month! =S

By the way, my application is a Zend Framework 1.11 website.

Thanks for everything and sorry for the bad english! ^^

Hi,
sorry, I dont have linux environment setup, please consult with dedicated forums.

mato DOT matejov AT gmail DOT com

07.06.2012 20:37
# 3 Reply to mato DOT matejov AT gmail DOT com    
 

Hi,

I am using PHP 5.3 (php.iis.net) over IIS in Windows Server 2008 R2 with MS SQL Server 2012. The application, I am having difficulties with, is Interspire Knowledge Manager (IKM). It supports 2 character sets: UTF-8 and Western Europe. I need to store slovak texts into MS SQL Server database (created automatically by IKM - set to Slovak_CI_AS collation). The slovak texts are stored incorectly and I am confused what should be done to make the application character sets match my database.

Thank you for your help.

lubosdz

07.06.2012 21:51
# 4 Reply to lubosdz    
 

mato.matejov@gmail.com wrote on 07.06.2012 20:37:
Hi,

I am using PHP 5.3 (php.iis.net) over IIS in Windows Server 2008 R2 with MS SQL Server 2012. The application, I am having difficulties with, is Interspire Knowledge Manager (IKM). It supports 2 character sets: UTF-8 and Western Europe. I need to store slovak texts into MS SQL Server database (created automatically by IKM - set to Slovak_CI_AS collation). The slovak texts are stored incorectly and I am confused what should be done to make the application character sets match my database.

Thank you for your help.

Hi Mato,

I am sorry, but I do not have any experience with IKM. As far as I understand, it is a commercial software with commercial support, so guys from Interspire should help you. Even if I wanted, I cannot download the product & install.

From what you wrote you should at least check following:

* PHP application written in UTF-8

* Does IIS7 use correct character set in headers? - check with firebug request - response headers. Check default character set for IIS.

* Collation Slovak case insensitive accent sensitive (Slovak_CI_AS) is correct I believe. I use the same (or Czech, it is almost the same) and it stores UTF-8 strings correctly. Check that your column types in database scheme support 2-byte storage. As for storing UTF-8 (which is 2-bytes) you must use column type NVARCHAR, rather than VARCHAR. The same applies for NTEXT and TEXT column types. Please read on this in section MSSQL at http://www.synet.sk/php/en/270-writing-portable-PHP-applications.

* Also make sure, your PHP is using official drivers supported by Microsoft and available at http://www.microsoft.com/en-us/download/details.aspx?id=20098. Download, find correct one, set SESSION INIT flags if needed. Drivers from Microsoft are the only ones that fully support storing UTF-8 as 2-byte characters in MSSQL. Read about connect options http://www.php.net/manual/en/function.sqlsrv-connect.php.

If this does not help, please consult with IKM support.
Regards,
Lubos

MValdez

23.06.2013 16:13
# 5 Reply to MValdez    
 

In Linux, using FreeTDS, it works with SQL Server 2008 if you use ini_set ("mssql.charset", "UTF-8") before connecting.

Regards, MV.

Leave your comment..
Email will be converted into something like [michael AT gmail DOT com]
Note: Offensive and unrelated comments will be deleted.
Please enter result from the picture above.