I was reading https://uncyclopedia.com/wiki/Rust_(programming_language)
for some reason, and read let U = 0; U = U + 1;
.
Suddenly my mind was awhirl with a Concept. I implemented it at once.
The problem
Unicode expresses its code points in syntax like U+1234 (full range U+0000–U+10FFFF).
But then when you want to transfer it to a programming language,
you have to learn another syntax. Will it be '\u1234'
, '\u{1234}'
,
"\x1E\x88\xB4"
, \341\210\264
, something else?
And then astral plane characters make it even worse:
"\U0001F631"
, '\u{1F631}'
, \xF0\x9F\x98\xB1
, "\uD83D\uDE31"
(with all the associated pain the abomination UTF-16 entails,
especially that your char type may simply not be able to represent this),
something else?
And so here is this crate that lets you use the True Unicode Syntax in Rust:
So forget about \u{…}
syntax and use U+…
literals!
(Caution: there are some limitations with this approach, see KNOWN_ISSUES.md for details.)
Links
-
The u-plus
crate on crates.io (or see on lib.rs).
-
The source code is on my Git server [Pity about that u-plus
in the URL. I wanted to do U+
, but gitweb wasn’t coping very well with plusses in the path, and I figured fixing that bug would take far too much effort, especially given the era that Perl code comes from. I’ve already patched it enough, I should just write something from scratch.]; it may be easiest to view the entire thing as a patch.