Jump to content

Php: How To Copy Files That Have 16-bit File Names?


Recommended Posts

I have written a PHP script that I use to back up my internal hard drive to an external hard drive. I use Windows XP, and my internal hard drive is NTFS formatted.


Omitting the details, I'm basically using:


$file = readdir($dir);

$original = $C_drive_path.'\\'.$file;

$bkup = $E_drive_path.'\\'.$file;

copy($original, $bkup);


The backup script works well, except when it comes across the occasional file that has Chinese characters in the file name (some of my music files). I believe file names in NTFS are UTF-16 encoded (16-bit). PHP, which is designed to work with 8-bit character data, does not handle these file names well. It interprets these file names as gibberish (mostly question marks) and generates an error message that says the file does not exist. Then, of course, it is unable to copy these files.


Does anybody know of a way to have PHP read a 16-bit file name correctly and then copy the file to another drive?



Link to comment
Share on other sites

Bruce and Madmanmcp,


Thanks for your excellent suggestions. xcopy and the Windows XP Backup executable are both included with the Windows XP operating system and are logical choices for doing efficient, regularly scheduled backups.


I decided to write a PHP script to back up files because I need to back up files, but also because I'm trying to use PHP wherever possible these days to improve my skills (and to see what the language has to offer). What I have discovered is that PHP has a history of weakness in the character-encoding area (see, for example, what Joel Spolsky has to say).


When I did not get an answer to my question about multibyte character encoding for file names at several different PHP user group forums, I contacted the PHP development team. From them, I learned that "readdir" is unable to read multibyte character encodings. The development team has gotten several requests for this feature and plans to add it in PHP 6.


So for now, PHP's directory functions can only handle file names whose characters are represented in an ISO-8859-1 compatible encoding. This makes PHP unsuitable for backing up an NTFS-formatted drive, which uses UTF-16LE (2-byte) character encoding for all file names.


(The exception is the special case where all files on an NTFS drive have names whose characters are included in the ISO-8859-1 repertoire; in this case, a PHP backup will work. This is because all information about each character is included in the first byte of its two-byte encoding, and the second byte is the null byte, which has no impact on the filename string in PHP. But requiring that all files have ISO-8859-1 compatible names seems unnecessarily restrictive when comprehensive backup utilities like xcopy and the Windows XP Backup executable exist.)


By the way, for anyone interested in digging into the details of character encoding, Jukka Korpela has published an informative primer.

Edited by dkotchen
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Create New...