The description given to this parameter, convmap
, for method mb_encode_numericentity
in the php manual is vague to me. Would somebody help with a better explanation of this, or maybe "dumb it down" if it should be sufficient for me? What is the meaning of the array elements used in this parameter? Example 1 in the manpage has
<?php
$convmap = array (
int start_code1, int end_code1, int offset1, int mask1,
int start_code2, int end_code2, int offset2, int mask2,
........
int start_codeN, int end_codeN, int offsetN, int maskN );
// Specify Unicode value for start_codeN and end_codeN
// Add offsetN to value and take bit-wise 'AND' with maskN, then
// it converts value to numeric string reference.
?>
which is helpful, but then I see a lot of usage examples like array(0x80, 0xffff, 0, 0xffff);
which throws me off. Does that mean the offset would be 0
and the mask would be 0xffff
, if so, does offset mean number of characters in the string to start converting, and what does mask
mean in this context?
Looking down the rabbit hole, it appears that the comments in the documentation for mb_encode_numericentity
are accurate, though somewhat cryptic.
The four major parts to the
convmap
appear to be:
start_code
: The map affects items starting from this character code.end_code
: The map affects items up to this character code.offset
: Add a specific offset amount (positive or negative) for this character code.mask
: Value to be used for mask operation (character code bitwise AND mask value).
Character codes can be visualized via character tables such as this Codepage Layout example for ISO-8859-1
encoding. (ISO-8859-1
is the encoding used in the original PHP documentation Example #2.) Looking at this encoding table, we can see that the convmap
is only meant to affect character code items that start from 0x80
(which appears to be blank for this particular encoding) to the final character in this encoding 0xff
(which appears to be ÿ
).
In order to better understand the offset and mask features of convmap
, here are some examples of how offset and mask affect character codes (and in the examples below, our character code
has a defined value of 162
):
<?php
$original_str = "¢";
$convmap = array(0x00, 0xff, 0, 0xff);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n";
?>
Result:
original: ¢ converted: ¢
<?php
$original_str = "¢";
$convmap = array(0x00, 0xff, 1, 0xff);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n";
?>
Result:
original: ¢ converted: £
The offset
seems to allow for a finer grain of control for the current start_code
and end_code
section of items-to-convert. For example, you might have some particular reason you need to add an offset for a certain line of character codes in your convmap
, but then you might need to ignore that offset for another line in your convmap
.
<?php
// Mask Example 1
$original_str = "¢";
$convmap = array(0x00, 0xff, 0, 0xf0);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n\n";
// Mask Example 2
$convmap = array(0x00, 0xff, 0, 0x0f);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n\n";
// Mask Example 3
$convmap = array(0x00, 0xff, 0, 0x00);
$converted_str = mb_encode_numericentity($original_str, $convmap, "UTF-8");
echo "original: $original_str\n";
echo "converted: $converted_str\n";
?>
Result:
original: ¢ converted:   original: ¢ converted:  original: ¢ converted: �
This answer does not intend to cover masking in great detail, but masking can help keep or remove certain bits from a given value.
So in the first mask example 0xf0
, the f
indicates that we want to keep the values on the left side of the binary value. Here, f
has a binary value of 1111
and 0
has a binary value of 0000
--together becoming a value of 11110000
.
Then, when we do a bitwise AND operation with our character code
(in this case, 162
, which has a binary value of 10100010
) the bitwise operation looks like this:
11110000
& 10100010
----------
10100000
And when converted back to its decimal value, 10100000
is 160
.
Therefore, we've effectively kept the "left side" of the bits from the original character code
value, and have gotten rid of the "right side" of the bits.
In the second mask example, the mask 0x0f
(which has a binary value of 00001111
) in the bitwise AND operation would have the following binary result:
00001111
& 10100010
----------
00000010
Which, when converted back to its decimal value, is 2
.
Therefore, we've effectively kept the "right side" of the bits from the original character code
value, and have gotten rid of the "left side" of the bits.
Finally, the third mask example shows what happens when using a mask of 0x00
(which is 00000000
in binary) in the bitwise AND operation:
00000000
& 10100010
----------
00000000
Which results in 0
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With