1 --- Unicode-String-2.09/README 2005-10-25 13:56:28.000000000 +0100
2 +++ Unicode-String-2.09/README.utf8 2010-02-18 09:11:45.235669975 +0000
4 o Depreciation because of perl's own utf8 support.
6 o Composition/decomposition support:
7 - $u->decomp; # will decomposite as much as possible: "å" --> "a°"
8 - $u->comp; # will composite as much as possible: "a°" --> "å"
9 + $u->decomp; # will decomposite as much as possible: "å" --> "a°"
10 + $u->comp; # will composite as much as possible: "a°" --> "å"
12 Need separate routines or a special argument to distinguish
13 between compatibility decomposition and canonical decomposition.
18 - print latin1("naïve\n")->utf8;
19 + print utf8("naïve\n")->latin1;
21 use Unicode::CharName qw(uname);
22 print uname(ord('$')), "\n";
27 - © 1997-2000,2005 Gisle Aas. All rights reserved.
28 + © 1997-2000,2005 Gisle Aas. All rights reserved.
30 This library is free software; you can redistribute it and/or modify
31 it under the same terms as Perl itself.
32 --- Unicode-String-2.09/String.pm 2005-10-26 09:13:10.000000000 +0100
33 +++ Unicode-String-2.09/String.pm.utf8 2010-02-18 09:11:45.234427359 +0000
35 current value is returned.
37 To illustrate the encodings we show how the 2 character sample string
38 -of "µm" (micro meter) is encoded for each one.
39 +of "µm" (micro meter) is encoded for each one.
44 =item $us->utf32be( $newval )
46 The string passed should be in the UTF-32 encoding with bytes in big
47 -endian order. The sample "µm" is "\0\0\0\xB5\0\0\0m" in this encoding.
48 +endian order. The sample "µm" is "\0\0\0\xB5\0\0\0m" in this encoding.
50 Alternative names for this method are utf32() and ucs4().
53 =item $us->utf32le( $newval )
55 The string passed should be in the UTF-32 encoding with bytes in little
56 -endian order. The sample "µm" is is "\xB5\0\0\0m\0\0\0" in this encoding.
57 +endian order. The sample "µm" is is "\xB5\0\0\0m\0\0\0" in this encoding.
61 =item $us->utf16be( $newval )
63 The string passed should be in the UTF-16 encoding with bytes in big
64 -endian order. The sample "µm" is "\0\xB5\0m" in this encoding.
65 +endian order. The sample "µm" is "\0\xB5\0m" in this encoding.
67 Alternative names for this method are utf16() and ucs2().
70 =item $us->utf16le( $newval )
72 The string passed should be in the UTF-16 encoding with bytes in
73 -little endian order. The sample "µm" is is "\xB5\0m\0" in this
74 +little endian order. The sample "µm" is is "\xB5\0m\0" in this
75 encoding. This is the encoding used by the Microsoft Windows API.
77 If the string passed to utf16le() starts with the Unicode byte order
80 =item $us->utf8( $newval )
82 -The string passed should be in the UTF-8 encoding. The sample "µm" is
83 +The string passed should be in the UTF-8 encoding. The sample "µm" is
84 "\xC2\xB5m" in this encoding.
88 =item $us->utf7( $newval )
90 -The string passed should be in the UTF-7 encoding. The sample "µm" is
91 +The string passed should be in the UTF-7 encoding. The sample "µm" is
92 "+ALU-m" in this encoding.
97 =item $us->latin1( $newval )
99 -The string passed should be in the ISO-8859-1 encoding. The sample "µm" is
100 +The string passed should be in the ISO-8859-1 encoding. The sample "µm" is
101 "\xB5m" in this encoding.
103 Characters outside the "\x00" .. "\xFF" range are simply removed from
105 The string passed should be plain ASCII where each Unicode character
106 is represented by the "U+XXXX" string and separated by a single space
107 character. The "U+" prefix is optional when setting the value. The
108 -sample "µm" is "U+00b5 U+006d" in this encoding.
109 +sample "µm" is "U+00b5 U+006d" in this encoding.