Jump to content
Sign in to follow this  
crossma

Substr Function For Asian Characters?

Recommended Posts

Hi

 

I am trying to port my web page to chinese. One of the problems is that I use a substr function to give page summaries, and this does not work well for chinese characters. A substr("chinesecharacters", 20) returns something like 3 chinese characters and a broken one at the end.

 

Is there a way for me to use an alternative substr-type function to return, for example, 5 chinese characters?

 

There was one last thing I wanted to say.. oh yeah:

 

Rock Sign

 

Cheers,

Roy

Share this post


Link to post
Share on other sites

Hi Roy.

 

What programming language are you using?

 

I tried searching the web about your problem using PHP but I couldn't find anything relevant... :rolleyes:

Share this post


Link to post
Share on other sites

Hi,

 

I haven't used PHP in a while, but I thought substr() does not support Multi-byte characters. Try using mb_substr() instead.

 

For mb_substr() to work, PHP has to be compiled with the "--enable-mbstring" option. I have no idea if that's the case for your server.

 

My best,

Tim

Share this post


Link to post
Share on other sites

Hey guys,

 

I forgot to mention, I'm using PHP now as you guessed.

 

I tried the mb_substr, but it does not look like it is compiled on my server. Would you know how I could get it installed?

 

Thanks for the help. I have tried researching also, but the answer still eludes me. I really wonder how programmers in E.Asia handle this problem- im certain there is some everyday function that they can use.

 

Roy

Share this post


Link to post
Share on other sites

Hello again,

 

Have you read the PHP manual about substr() and the comments posted by other users?

 

http://www.php.net/manual/en/function.substr.php

 

A user there named 'ken at wisers dot com' suggests the following replacement function:

 

>function dbyte_substr($str, $start, $len=''){
       if($len == ''){
               $outstr = substr($str, $start);
       }else{
               $outstr = substr($str, $start, $len);
               // Check the end bound is an double byte first byte or not
               if(preg_match("/[\x80-\xFF]$/", $outstr)){
                       $outstr = substr("$outstr", 0, -1);
               }
       }
       return $outstr;
}

 

I have never tried or tested this, so don't blame me if the server blows up or something like that! :)

 

BTW, what encoding are you using for Chinese?

 

My best,

Tim

Share this post


Link to post
Share on other sites

Thanks so much for your help Tim!

 

The encoding is Chinese Simplified GB2312

 

The function does not seem to be working unfortunately.

I will keep looking also and post if I find anything.

 

<_<

 

Roy

Share this post


Link to post
Share on other sites
Hi,

 

I haven't used PHP in a while, but I thought substr() does not support Multi-byte characters. Try using mb_substr() instead.

 

For mb_substr() to work, PHP has to be compiled with the "--enable-mbstring" option. I have no idea if that's the case for your server.

 

My best,

Tim

This is one of the major weaknesses of PHP: various internationalization and localization features are considered optional, and US-based servers tend not to have them compiled.

 

BUT if you can afford to port your code to Perl (especially 5.8.x, which TCH has -- yeah!), the outlook is good.

 

Rock Sign

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...