Error saving content containing non-Basic Unicode characters

Edit: I changed the title of this problem because I've found it's much more widespread. The problem also occurs if the character is in the Title or the body of a Page which probably means it can happen in any text entry field available to site admins or users. Beyond symbols and extended ideograph character sets, this issue could be encountered when someone enters or pastes in emoji such as a smiley face (I tried pasting one in here but it when I saved this post not only did Get Satisfaction not show the Unicode character, it truncated the Post at that point!)

Original message:
I was trying to add Unicode character 🖹 ("Document with text") to the Name of a taxonomy term by pasting the UTF-8 character in from another file, not by typing its HTML entity. When I clicked Save, I was briefly shown the text below then redirected back to the form with the original Name reloaded.

Doing some testing with other Unicode characters, this seems to happen with any character outside the Basic Multilingual Plane, that is, Unicode code points beyond U+FFFF requiring four bytes to encode in UTF-8. If support for such characters can be added, great, but if not, I think the error condition could be handled better by the platform.

"OpenScholar
Temporarily Unavailable

We're sorry, the website that you're trying to reach is currently unavailable. Please try again in a few moments.

• You can track our status on the HUIT Service Status Dashboard.
• You can also contact the help desk at HUIT Service Desk.
Thank you for your patience."
1 person has
this problem
+1
Reply