#52 Problems with unicode strings - Johnson

Type	To find
responsible:me	tickets assigned to you
tagged:"@high"	tickets tagged @high
milestone:next	tickets in the upcoming milestone
state:invalid	tickets with the state invalid
created:"last week"	tickets created last week
sort:number, importance, updated	tickets sorted by #, importance or updated
Combine keywords for powerful searching.
Use advanced searching »

#52 new

Problems with unicode strings

Reported by Martin Skinner | August 31st, 2009 @ 03:28 AM

Unicode javascript strings are not transferred to ruby correctly. Here's an example. I create a javascript string consisting of a single Euro-Sign (see http://www.fileformat.info/info/unicode/char/20ac/index.htm)

irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> require 'johnson'
=> true
irb(main):007:0> s = Johnson.evaluate("'\\u20AC'")
=> "\254"

In ruby, we're getting a single byte with the value 254 (octal), which is 172 decimal, or 0xAC. So it looks like we're only getting the low-byte of our 16-bit Unicode char. After scanning the Johnson code, I think I found the culprit - JS_GetStringBytes returns the bytes of a Unicode-16 String by stripping off the high-bytes.

Note that for non-ASCII strings, if JS_CStringsAreUTF8 is false, these functions can return a corrupted copy of the contents of >the string. Use JS_GetStringChars to access the 16-bit characters of a JavaScript string without conversions or copying.

A similar problem probably exists in the other direction (ruby -> js) too.

I suggest trying JS_CStringsAreUTF8 (which may solve both problems). If this fails, then johnson would have to extract the Unicode-16 chars from spidermonkey and convert them to a ruby-friendly encoding.

No comments found

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile »

Johnson Johnson

Problems with unicode strings

No comments found

Create your profile

People watching this ticket

Tags

Pages

Johnson Johnson

Keyword searching

Problems with unicode strings

No comments found

Create your profile

People watching this ticket

Tags

Pages