lithium\g11n\Multibyte::is()
Checks if a given string is UTF-8 encoded and is valid UTF-8.
In quick mode it will check only for non ASCII characters being used indicating any multibyte encoding. Don't use quick mode for integrity validation of UTF-8 encoded strings.
Meaning of RegExp:
'[\x09\x0A\x0D\x20-\x7E]'; // ASCII
'|[\xC2-\xDF][\x80-\xBF]'; // non-overlong 2-byte
'|\xE0[\xA0-\xBF][\x80-\xBF]'; // excluding overlongs
'|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}'; // straight 3-byte
'|\xED[\x80-\x9F][\x80-\xBF]'; // excluding surrogates
'|\xF0[\x90-\xBF][\x80-\xBF]{2}'; // planes 1-3
'|[\xF1-\xF3][\x80-\xBF]{3}'; // planes 4-15
'|\xF4[\x80-\x8F][\x80-\xBF]{2}'; // plane 16
Parameters
-
string
$string
The string to analyze.
-
array
$options
Allows to toggle mode via the
'quick'
option, defaults tofalse
.
Returns
booleanReturns true
if the string is UTF-8.
Links
Source
public static function is($string, array $options = []) {
$defaults = ['quick' => false];
$options += $defaults;
if ($options['quick']) {
$regex = '/[^\x09\x0A\x0D\x20-\x7E]/m';
} else {
$regex = '/\A(';
$regex .= '[\x09\x0A\x0D\x20-\x7E]';
$regex .= '|[\xC2-\xDF][\x80-\xBF]';
$regex .= '|\xE0[\xA0-\xBF][\x80-\xBF]';
$regex .= '|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}';
$regex .= '|\xED[\x80-\x9F][\x80-\xBF]';
$regex .= '|\xF0[\x90-\xBF][\x80-\xBF]{2}';
$regex .= '|[\xF1-\xF3][\x80-\xBF]{3}';
$regex .= '|\xF4[\x80-\x8F][\x80-\xBF]{2}';
$regex .= ')*\z/m';
}
return (boolean) preg_match($regex, $string);
}