Utf8String
class Utf8String (View source)
String handling class for utf-8 data Wraps the phputf8 library All functions assume the validity of utf-8 strings.
This class is based on Joomla String package
Methods
Tests whether a string contains only 7bit ASCII bytes.
UTF-8 aware alternative to strpos.
UTF-8 aware alternative to strrpos Finds position of last occurrence of a string
UTF-8 aware alternative to substr Return part of a string given character offset (and optionally length)
UTF-8 aware alternative to strtlower
UTF-8 aware alternative to strtoupper Make a string uppercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings
UTF-8 aware alternative to strlen.
UTF-8 aware alternative to strireplace Case-insensitive version of strreplace
UTF-8 aware alternative to str_split Convert a string to an array
UTF-8/LOCALE aware alternative to strcasecmp A case insensitive string comparison
UTF-8/LOCALE aware alternative to strcmp A case sensitive string comparison
UTF-8 aware alternative to strcspn Find length of initial segment not matching mask
UTF-8 aware alternative to stristr Returns all of haystack from the first occurrence of needle to the end.
UTF-8 aware alternative to strrev Reverse a string
UTF-8 aware alternative to strspn Find length of initial segment matching mask
UTF-8 aware substr_replace Replace text within a portion of a string
UTF-8 aware replacement for ltrim()
UTF-8 aware replacement for rtrim() Strip whitespace (or other characters) from the end of a string You only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise rtrim will work normally on a UTF-8 string
UTF-8 aware replacement for trim() Strip whitespace (or other characters) from the beginning and end of a string Note: you only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise trim will work normally on a UTF-8 string
UTF-8 aware alternative to ucfirst Make a string's first character uppercase or all words' first character uppercase
UTF-8 aware alternative to ucwords Uppercase the first character of each word in a string
Transcode a string.
Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard.
Tests whether a string complies as UTF-8. This will be much faster than utf8isvalid but will pass five and six octet UTF-8 sequences, which are not supported by Unicode and so cannot be displayed correctly in a browser. In other words it is not as strict as utf8isvalid but it's faster. If you use it to validate user input, you place yourself at the risk that attackers will be able to inject 5 and 6 byte sequences (which may or may not be a significant risk, depending on what you are are doing)
Converts Unicode sequences to UTF-8 string
Converts Unicode sequences to UTF-16 string
Details
at line line 82
static boolean
is_ascii(string $str)
Tests whether a string contains only 7bit ASCII bytes.
You might use this to conditionally check whether a string needs handling as UTF-8 or not, potentially offering performance benefits by using the native PHP equivalent if it's just ASCII e.g.;
php
if (String::is_ascii($someString))
{
// It's just ASCII - use the native PHP version
$someString = strtolower($someString);
}
else
{
$someString = String::strtolower($someString);
}
at line line 102
static mixed
strpos(string $str, string $search, integer $offset = false)
UTF-8 aware alternative to strpos.
Find position of first occurrence of a string.
at line line 125
static mixed
strrpos(string $str, string $search, integer $offset)
UTF-8 aware alternative to strrpos Finds position of last occurrence of a string
at line line 143
static mixed
substr(string $str, integer $offset, integer $length = false)
UTF-8 aware alternative to substr Return part of a string given character offset (and optionally length)
at line line 169
static mixed
strtolower(string $str)
UTF-8 aware alternative to strtlower
Make a string lowercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings
at line line 189
static mixed
strtoupper(string $str)
UTF-8 aware alternative to strtoupper Make a string uppercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings
at line line 206
static integer
strlen(string $str)
UTF-8 aware alternative to strlen.
Returns the number of characters in the string (NOT THE NUMBER OF BYTES),
at line line 225
static string
str_ireplace(string $search, string $replace, string $str, integer $count = null)
UTF-8 aware alternative to strireplace Case-insensitive version of strreplace
at line line 252
static array
str_split(string $str, integer $split_len = 1)
UTF-8 aware alternative to str_split Convert a string to an array
at line line 277
static integer
strcasecmp(string $str1, string $str2, mixed $locale = false)
UTF-8/LOCALE aware alternative to strcasecmp A case insensitive string comparison
at line line 333
static integer
strcmp(string $str1, string $str2, mixed $locale = false)
UTF-8/LOCALE aware alternative to strcmp A case sensitive string comparison
at line line 385
static integer
strcspn(string $str, string $mask, integer $start = null, integer $length = null)
UTF-8 aware alternative to strcspn Find length of initial segment not matching mask
at line line 419
static string
stristr(string $str, string $search)
UTF-8 aware alternative to stristr Returns all of haystack from the first occurrence of needle to the end.
needle and haystack are examined in a case-insensitive manner Find first occurrence of a string using case insensitive comparison
at line line 440
static string
strrev(string $str)
UTF-8 aware alternative to strrev Reverse a string
at line line 464
static integer
strspn(string $str, string $mask, integer $start = null, integer $length = null)
UTF-8 aware alternative to strspn Find length of initial segment matching mask
at line line 498
static string
substr_replace(string $str, string $repl, integer $start, integer $length = null)
UTF-8 aware substr_replace Replace text within a portion of a string
at line line 525
static string
ltrim(string $str, string $charlist = null)
UTF-8 aware replacement for ltrim()
Strip whitespace (or other characters) from the beginning of a string You only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise ltrim will work normally on a UTF-8 string
at line line 560
static string
rtrim(string $str, string $charlist = null)
UTF-8 aware replacement for rtrim() Strip whitespace (or other characters) from the end of a string You only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise rtrim will work normally on a UTF-8 string
at line line 595
static string
trim(string $str, string $charlist = null)
UTF-8 aware replacement for trim() Strip whitespace (or other characters) from the beginning and end of a string Note: you only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise trim will work normally on a UTF-8 string
at line line 630
static string
ucfirst(string $str, string $delimiter = null, string $newDelimiter = null)
UTF-8 aware alternative to ucfirst Make a string's first character uppercase or all words' first character uppercase
at line line 661
static string
ucwords(string $str)
UTF-8 aware alternative to ucwords Uppercase the first character of each word in a string
at line line 684
static mixed
transcode(string $source, string $from_encoding, string $to_encoding)
Transcode a string.
at line line 718
static boolean
valid(string $str)
Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard.
Note: this function has been modified to simple return true or false.
at line line 744
static boolean
compliant(string $str)
Tests whether a string complies as UTF-8. This will be much faster than utf8isvalid but will pass five and six octet UTF-8 sequences, which are not supported by Unicode and so cannot be displayed correctly in a browser. In other words it is not as strict as utf8isvalid but it's faster. If you use it to validate user input, you place yourself at the risk that attackers will be able to inject 5 and 6 byte sequences (which may or may not be a significant risk, depending on what you are are doing)
at line line 760
static string
unicode_to_utf8(string $str)
Converts Unicode sequences to UTF-8 string
at line line 786
static string
unicode_to_utf16(string $str)
Converts Unicode sequences to UTF-16 string