class Utf8String (View source)

String handling class for utf-8 data Wraps the phputf8 library All functions assume the validity of utf-8 strings.

This class is based on Joomla String package

Methods

static boolean
is_ascii(string $str)

Tests whether a string contains only 7bit ASCII bytes.

static mixed
strpos(string $str, string $search, integer $offset = false)

UTF-8 aware alternative to strpos.

static mixed
strrpos(string $str, string $search, integer $offset)

UTF-8 aware alternative to strrpos Finds position of last occurrence of a string

static mixed
substr(string $str, integer $offset, integer $length = false)

UTF-8 aware alternative to substr Return part of a string given character offset (and optionally length)

static mixed
strtolower(string $str)

UTF-8 aware alternative to strtlower

static mixed
strtoupper(string $str)

UTF-8 aware alternative to strtoupper Make a string uppercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings

static integer
strlen(string $str)

UTF-8 aware alternative to strlen.

static string
str_ireplace(string $search, string $replace, string $str, integer $count = null)

UTF-8 aware alternative to strireplace Case-insensitive version of strreplace

static array
str_split(string $str, integer $split_len = 1)

UTF-8 aware alternative to str_split Convert a string to an array

static integer
strcasecmp(string $str1, string $str2, mixed $locale = false)

UTF-8/LOCALE aware alternative to strcasecmp A case insensitive string comparison

static integer
strcmp(string $str1, string $str2, mixed $locale = false)

UTF-8/LOCALE aware alternative to strcmp A case sensitive string comparison

static integer
strcspn(string $str, string $mask, integer $start = null, integer $length = null)

UTF-8 aware alternative to strcspn Find length of initial segment not matching mask

static string
stristr(string $str, string $search)

UTF-8 aware alternative to stristr Returns all of haystack from the first occurrence of needle to the end.

static string
strrev(string $str)

UTF-8 aware alternative to strrev Reverse a string

static integer
strspn(string $str, string $mask, integer $start = null, integer $length = null)

UTF-8 aware alternative to strspn Find length of initial segment matching mask

static string
substr_replace(string $str, string $repl, integer $start, integer $length = null)

UTF-8 aware substr_replace Replace text within a portion of a string

static string
ltrim(string $str, string $charlist = null)

UTF-8 aware replacement for ltrim()

static string
rtrim(string $str, string $charlist = null)

UTF-8 aware replacement for rtrim() Strip whitespace (or other characters) from the end of a string You only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise rtrim will work normally on a UTF-8 string

static string
trim(string $str, string $charlist = null)

UTF-8 aware replacement for trim() Strip whitespace (or other characters) from the beginning and end of a string Note: you only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise trim will work normally on a UTF-8 string

static string
ucfirst(string $str, string $delimiter = null, string $newDelimiter = null)

UTF-8 aware alternative to ucfirst Make a string's first character uppercase or all words' first character uppercase

static string
ucwords(string $str)

UTF-8 aware alternative to ucwords Uppercase the first character of each word in a string

static mixed
transcode(string $source, string $from_encoding, string $to_encoding)

Transcode a string.

static boolean
valid(string $str)

Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard.

static boolean
compliant(string $str)

Tests whether a string complies as UTF-8. This will be much faster than utf8isvalid but will pass five and six octet UTF-8 sequences, which are not supported by Unicode and so cannot be displayed correctly in a browser. In other words it is not as strict as utf8isvalid but it's faster. If you use it to validate user input, you place yourself at the risk that attackers will be able to inject 5 and 6 byte sequences (which may or may not be a significant risk, depending on what you are are doing)

static string
unicode_to_utf8(string $str)

Converts Unicode sequences to UTF-8 string

static string
unicode_to_utf16(string $str)

Converts Unicode sequences to UTF-16 string

Details

at line line 82
static boolean is_ascii(string $str)

Tests whether a string contains only 7bit ASCII bytes.

You might use this to conditionally check whether a string needs handling as UTF-8 or not, potentially offering performance benefits by using the native PHP equivalent if it's just ASCII e.g.;

php if (String::is_ascii($someString)) { // It's just ASCII - use the native PHP version $someString = strtolower($someString); } else { $someString = String::strtolower($someString); }

Parameters

string $str The string to test.

Return Value

boolean True if the string is all ASCII

at line line 102
static mixed strpos(string $str, string $search, integer $offset = false)

UTF-8 aware alternative to strpos.

Find position of first occurrence of a string.

Parameters

string $str String being examined
string $search String being searched for
integer $offset Optional, specifies the position from which the search should be performed

Return Value

mixed Number of characters before the first match or FALSE on failure

See also

http://www.php.net/strpos

at line line 125
static mixed strrpos(string $str, string $search, integer $offset)

UTF-8 aware alternative to strrpos Finds position of last occurrence of a string

Parameters

string $str String being examined.
string $search String being searched for.
integer $offset Offset from the left of the string.

Return Value

mixed Number of characters before the last match or false on failure

See also

http://www.php.net/strrpos

at line line 143
static mixed substr(string $str, integer $offset, integer $length = false)

UTF-8 aware alternative to substr Return part of a string given character offset (and optionally length)

Parameters

string $str String being processed
integer $offset Number of UTF-8 characters offset (from left)
integer $length Optional length in UTF-8 characters from offset

Return Value

mixed string or FALSE if failure

See also

http://www.php.net/substr

at line line 169
static mixed strtolower(string $str)

UTF-8 aware alternative to strtlower

Make a string lowercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings

Parameters

string $str String being processed

Return Value

mixed Either string in lowercase or FALSE is UTF-8 invalid

See also

http://www.php.net/strtolower

at line line 189
static mixed strtoupper(string $str)

UTF-8 aware alternative to strtoupper Make a string uppercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings

Parameters

string $str String being processed

Return Value

mixed Either string in uppercase or FALSE is UTF-8 invalid

See also

http://www.php.net/strtoupper

at line line 206
static integer strlen(string $str)

UTF-8 aware alternative to strlen.

Returns the number of characters in the string (NOT THE NUMBER OF BYTES),

Parameters

string $str UTF-8 string.

Return Value

integer Number of UTF-8 characters in string.

See also

http://www.php.net/strlen

at line line 225
static string str_ireplace(string $search, string $replace, string $str, integer $count = null)

UTF-8 aware alternative to strireplace Case-insensitive version of strreplace

Parameters

string $search String to search
string $replace Existing string to replace
string $str New string to replace with
integer $count Optional count value to be passed by reference

Return Value

string UTF-8 String

See also

http://www.php.net/str_ireplace

at line line 252
static array str_split(string $str, integer $split_len = 1)

UTF-8 aware alternative to str_split Convert a string to an array

Parameters

string $str UTF-8 encoded string to process
integer $split_len Number to characters to split string by

Return Value

array

See also

http://www.php.net/str_split

at line line 277
static integer strcasecmp(string $str1, string $str2, mixed $locale = false)

UTF-8/LOCALE aware alternative to strcasecmp A case insensitive string comparison

Parameters

string $str1 string 1 to compare
string $str2 string 2 to compare
mixed $locale The locale used by strcoll or false to use classical comparison

Return Value

integer < 0 if str1 is less than str2; > 0 if str1 is greater than str2, and 0 if they are equal.

See also

http://www.php.net/strcasecmp
http://www.php.net/strcoll
http://www.php.net/setlocale

at line line 333
static integer strcmp(string $str1, string $str2, mixed $locale = false)

UTF-8/LOCALE aware alternative to strcmp A case sensitive string comparison

Parameters

string $str1 string 1 to compare
string $str2 string 2 to compare
mixed $locale The locale used by strcoll or false to use classical comparison

Return Value

integer < 0 if str1 is less than str2; > 0 if str1 is greater than str2, and 0 if they are equal.

See also

http://www.php.net/strcmp
http://www.php.net/strcoll
http://www.php.net/setlocale

at line line 385
static integer strcspn(string $str, string $mask, integer $start = null, integer $length = null)

UTF-8 aware alternative to strcspn Find length of initial segment not matching mask

Parameters

string $str The string to process
string $mask The mask
integer $start Optional starting character position (in characters)
integer $length Optional length

Return Value

integer The length of the initial segment of str1 which does not contain any of the characters in str2

See also

http://www.php.net/strcspn

at line line 419
static string stristr(string $str, string $search)

UTF-8 aware alternative to stristr Returns all of haystack from the first occurrence of needle to the end.

needle and haystack are examined in a case-insensitive manner Find first occurrence of a string using case insensitive comparison

Parameters

string $str The haystack
string $search The needle

Return Value

string the sub string

See also

http://www.php.net/stristr

at line line 440
static string strrev(string $str)

UTF-8 aware alternative to strrev Reverse a string

Parameters

string $str String to be reversed

Return Value

string The string in reverse character order

See also

http://www.php.net/strrev

at line line 464
static integer strspn(string $str, string $mask, integer $start = null, integer $length = null)

UTF-8 aware alternative to strspn Find length of initial segment matching mask

Parameters

string $str The haystack
string $mask The mask
integer $start Start optional
integer $length Length optional

Return Value

integer

See also

http://www.php.net/strspn

at line line 498
static string substr_replace(string $str, string $repl, integer $start, integer $length = null)

UTF-8 aware substr_replace Replace text within a portion of a string

Parameters

string $str The haystack
string $repl The replacement string
integer $start Start
integer $length Length (optional)

Return Value

string

See also

http://www.php.net/substr_replace

at line line 525
static string ltrim(string $str, string $charlist = null)

UTF-8 aware replacement for ltrim()

Strip whitespace (or other characters) from the beginning of a string You only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise ltrim will work normally on a UTF-8 string

Parameters

string $str The string to be trimmed
string $charlist The optional charlist of additional characters to trim

Return Value

string The trimmed string

See also

http://www.php.net/ltrim

at line line 560
static string rtrim(string $str, string $charlist = null)

UTF-8 aware replacement for rtrim() Strip whitespace (or other characters) from the end of a string You only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise rtrim will work normally on a UTF-8 string

Parameters

string $str The string to be trimmed
string $charlist The optional charlist of additional characters to trim

Return Value

string The trimmed string

See also

http://www.php.net/rtrim

at line line 595
static string trim(string $str, string $charlist = null)

UTF-8 aware replacement for trim() Strip whitespace (or other characters) from the beginning and end of a string Note: you only need to use this if you are supplying the charlist optional arg and it contains UTF-8 characters. Otherwise trim will work normally on a UTF-8 string

Parameters

string $str The string to be trimmed
string $charlist The optional charlist of additional characters to trim

Return Value

string The trimmed string

See also

http://www.php.net/trim

at line line 630
static string ucfirst(string $str, string $delimiter = null, string $newDelimiter = null)

UTF-8 aware alternative to ucfirst Make a string's first character uppercase or all words' first character uppercase

Parameters

string $str String to be processed
string $delimiter The words delimiter (null means do not split the string)
string $newDelimiter The new words delimiter (null means equal to $delimiter)

Return Value

string If $delimiter is null, return the string with first character as upper case (if applicable) else consider the string of words separated by the delimiter, apply the ucfirst to each words and return the string with the new delimiter

See also

http://www.php.net/ucfirst

at line line 661
static string ucwords(string $str)

UTF-8 aware alternative to ucwords Uppercase the first character of each word in a string

Parameters

string $str String to be processed

Return Value

string String with first char of each word uppercase

See also

http://www.php.net/ucwords

at line line 684
static mixed transcode(string $source, string $from_encoding, string $to_encoding)

Transcode a string.

Parameters

string $source The string to transcode.
string $from_encoding The source encoding.
string $to_encoding The target encoding.

Return Value

mixed The transcoded string, or null if the source was not a string.

at line line 718
static boolean valid(string $str)

Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard.

Note: this function has been modified to simple return true or false.

Parameters

string $str UTF-8 encoded string.

Return Value

boolean true if valid

See also

http://hsivonen.iki.fi/php-utf8/
compliant

at line line 744
static boolean compliant(string $str)

Tests whether a string complies as UTF-8. This will be much faster than utf8isvalid but will pass five and six octet UTF-8 sequences, which are not supported by Unicode and so cannot be displayed correctly in a browser. In other words it is not as strict as utf8isvalid but it's faster. If you use it to validate user input, you place yourself at the risk that attackers will be able to inject 5 and 6 byte sequences (which may or may not be a significant risk, depending on what you are are doing)

Parameters

string $str UTF-8 string to check

Return Value

boolean TRUE if string is valid UTF-8

See also

valid
http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php#54805

at line line 760
static string unicode_to_utf8(string $str)

Converts Unicode sequences to UTF-8 string

Parameters

string $str Unicode string to convert

Return Value

string UTF-8 string

at line line 786
static string unicode_to_utf16(string $str)

Converts Unicode sequences to UTF-16 string

Parameters

string $str Unicode string to convert

Return Value

string UTF-16 string