Unicode Normalise a String in Rails
One strange issue I had on Scribbles, especially with creating a
page/post url based on the title, was that sometimes someone would
use... wait for it... full-width characters. Huh? What?
Yes,
I never encountered this... EVER. Here is a full-width page title:
About
Now that actually is just "About" but as "full-width"
characters, and that poses a problem.
When Scribbles saves a
post or page, it'll go ahead and try and parameterize
the page/post
title. Here is a an example of my method:
if self.title.present? url = self.title.parameterize else url = "#{self.published_date.strftime("%Y-%m-%d")}" end
The problem with full-width characters is that they're not standard, so
calling parameterize
would just return empty — and thus just throw
errors like there is no tomorrow. Not really, but having an empty url
kinda defeats the purpose of create a URL.
Thankfully Rails
has a built in string helper called unicode_normalize
. You can see
what it does
here in the docs.
So now I do the following:
url = self.title.unicode_normalize(:nfkc).parameterize.downcase
You can see here I also added :nfkc
which applies compatibility
decomposition, followed by canonical composition. No idea what that
means, but Rails suggested it as it threw an error in my initial
attempt. I also added a downcase
just to make sure everything is lower
case.
Anyway, this was interesting to figure out.