PHPXRef 0.7.1 : Unnamed Project : Detail view of utf

Functions that are not part of a class:

Implementation of PHP's native utf8_encode for people without XML support
This function exploits some nice things that ISO-8859-1 and UTF-8 have in common

param: string $str ISO-8859-1 encoded data
return: string UTF-8 encoded data

utf8_decode($str) X-Ref

Implementation of PHP's native utf8_decode for people without XML support

param: string $str UTF-8 encoded data
return: string ISO-8859-1 encoded data

utf8_strrpos($str, $needle, $offset = null) X-Ref

UTF-8 aware alternative to strrpos

utf8_strrpos($str, $needle, $offset = null) X-Ref

UTF-8 aware alternative to strrpos

utf8_strpos($str, $needle, $offset = null) X-Ref

UTF-8 aware alternative to strpos

utf8_strtolower($str) X-Ref

UTF-8 aware alternative to strtolower

utf8_strtoupper($str) X-Ref

UTF-8 aware alternative to strtoupper

utf8_substr($str, $offset, $length = null) X-Ref

UTF-8 aware alternative to substr

utf8_strlen($text) X-Ref

Return the length (in characters) of a UTF-8 string

utf8_strrpos($str, $needle, $offset = null) X-Ref

UTF-8 aware alternative to strrpos
Find position of last occurrence of a char in a string

author: Harry Fuecks
param: string $str haystack
param: string $needle needle
param: integer $offset (optional) offset (from left)
return: mixed integer position or FALSE on failure

utf8_strpos($str, $needle, $offset = null) X-Ref

UTF-8 aware alternative to strpos
Find position of first occurrence of a string

author: Harry Fuecks
param: string $str haystack
param: string $needle needle
param: integer $offset offset in characters (from left)
return: mixed integer position or FALSE on failure

utf8_strtolower($string) X-Ref

UTF-8 aware alternative to strtolower
Make a string lowercase
Note: The concept of a characters "case" only exists is some alphabets
such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
not exist in the Chinese alphabet, for example. See Unicode Standard
Annex #21: Case Mappings

param: string
return: string string in lowercase

utf8_strtoupper($string) X-Ref

UTF-8 aware alternative to strtoupper
Make a string uppercase
Note: The concept of a characters "case" only exists is some alphabets
such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
not exist in the Chinese alphabet, for example. See Unicode Standard
Annex #21: Case Mappings

param: string
return: string string in uppercase

utf8_substr($str, $offset, $length = NULL) X-Ref

UTF-8 aware alternative to substr
Return part of a string given character offset (and optionally length)

Note arguments: comparied to substr - if offset or length are
not integers, this version will not complain but rather massages them
into an integer.

Note on returned values: substr documentation states false can be
returned in some cases (e.g. offset > string length)
mb_substr never returns false, it will return an empty string instead.
This adopts the mb_substr approach

Note on implementation: PCRE only supports repetitions of less than
65536, in order to accept up to MAXINT values for offset and length,
we'll repeat a group of 65535 characters when needed.

Note on implementation: calculating the number of characters in the
string is a relatively expensive operation, so we only carry it out when
necessary. It isn't necessary for +ve offsets and no specified length

author: Chris Smith<chris@jalakai.co.uk>
param: string $str
param: integer $offset number of UTF-8 characters offset (from left)
param: integer $length (optional) length in UTF-8 characters from offset
return: mixed string or FALSE if failure

utf8_strlen($text) X-Ref

Return the length (in characters) of a UTF-8 string

param: string $text UTF-8 string
return: integer Length (in chars) of given string

utf8_str_split($str, $split_len = 1) X-Ref

UTF-8 aware alternative to str_split
Convert a string to an array

author: Harry Fuecks
param: string $str UTF-8 encoded
param: int $split_len number to characters to split string by
return: array characters in string reverses

utf8_strspn($str, $mask, $start = null, $length = null) X-Ref

UTF-8 aware alternative to strspn
Find length of initial segment matching the mask

author: Harry Fuecks

utf8_ucfirst($str) X-Ref

UTF-8 aware alternative to ucfirst
Make a string's first character uppercase

author: Harry Fuecks
param: string
return: string with first character as upper case (if applicable)

utf8_recode($string, $encoding) X-Ref

Recode a string to UTF-8

If the encoding is not supported, the string is returned as-is

param: string    $string        Original string
param: string    $encoding    Original encoding (lowered)
return: string                The string, encoded in UTF-8

utf8_encode_ncr($text) X-Ref

Replace all UTF-8 chars that are not in ASCII with their NCR

param: string $text UTF-8 string in NFC
return: string ASCII string using NCRs for non-ASCII chars

utf8_encode_ncr_callback($m) X-Ref

Callback used in encode_ncr()

Takes a UTF-8 char and replaces it with its NCR. Attention, $m is an array

param: array $m 0-based numerically indexed array passed by preg_replace_callback()
return: string A HTML NCR if the character is valid, or the original string otherwise

utf8_ord($chr) X-Ref

Converts a UTF-8 char to an NCR

param: string $chr UTF-8 char
return: integer UNICODE code point

utf8_chr($cp) X-Ref

Converts an NCR to a UTF-8 char

param: int $cp UNICODE code point
return: string UTF-8 char

utf8_decode_ncr($text) X-Ref

Convert Numeric Character References to UTF-8 chars

Notes:
- we do not convert NCRs recursively, if you pass &#38; it will return &
- we DO NOT check for the existence of the Unicode characters, therefore an entity may be converted to an inexistent codepoint

param: string $text String to convert, encoded in UTF-8 (no normal form required)
return: string UTF-8 string where NCRs have been replaced with the actual chars

utf8_decode_ncr_callback($m) X-Ref

Callback used in decode_ncr()

Takes a NCR (in decimal or hexadecimal) and returns a UTF-8 char. Attention, $m is an array.
It will ignore most of invalid NCRs, but not all!

param: array $m 0-based numerically indexed array passed by preg_replace_callback()
return: string UTF-8 char

utf8_case_fold($text, $option = 'full') X-Ref

Case folds a unicode string as per Unicode 5.0, section 3.13

param: string    $text    text to be case folded
param: string    $option    determines how we will fold the cases
return: string            case folded text

utf8_case_fold_nfkc($text, $option = 'full') X-Ref

Takes the input and does a "special" case fold. It does minor normalization
and returns NFKC compatable text

param: string    $text    text to be case folded
param: string    $option    determines how we will fold the cases
return: string            case folded text

utf8_case_fold_nfc($text, $option = 'full') X-Ref

Assume the input is NFC:
Takes the input and does a "special" case fold. It does minor normalization as well.

param: string    $text    text to be case folded
param: string    $option    determines how we will fold the cases
return: string            case folded text

utf8_normalize_nfc($strings) X-Ref

A wrapper function for the normalizer which takes care of including the class if required and modifies the passed strings
to be in NFC (Normalization Form Composition).

param: mixed $strings a string or an array of strings to normalize
return: mixed the normalized content, preserving array keys if array given.

utf8_clean_string($text) X-Ref

This function is used to generate a "clean" version of a string.
Clean means that it is a case insensitive form (case folding) and that it is normalized (NFC).
Additionally a homographs of one character are transformed into one specific character (preferably ASCII
if it is an ASCII character).

Please be aware that if you change something within this function or within
functions used here you need to rebuild/update the username_clean column in the users table. And all other
columns that store a clean string otherwise you will break this functionality.

param: string $text An unclean string, mabye user input (has to be valid UTF-8!)
return: string Cleaned up version of the input string

utf8_htmlspecialchars($value) X-Ref

A wrapper for htmlspecialchars($value, ENT_COMPAT, 'UTF-8')

utf8_convert_message($message) X-Ref

Trying to convert returned system message to utf8

PHP assumes such messages are ISO-8859-1 so we'll do that too
and if it breaks messages we'll blame it on them ;-)

utf8_wordwrap($string, $width = 75, $break = "\n", $cut = false) X-Ref

UTF8-compatible wordwrap replacement

param: string    $string    The input string
param: int        $width    The column width. Defaults to 75.
param: string    $break    The line is broken using the optional break parameter. Defaults to '\n'.
param: bool    $cut    If the cut is set to TRUE, the string is always wrapped at the specified width. So if you have a word that is larger than the given width, it is broken apart.
return: string            the given string wrapped at the specified column.

utf8_basename($filename) X-Ref

UTF8-safe basename() function

basename() has some limitations and is dependent on the locale setting
according to the PHP manual. Therefore we provide our own locale independant
basename function.

param: string $filename The filename basename() should be applied to
return: string The basenamed filename

utf8_str_replace($search, $replace, $subject) X-Ref

UTF8-safe str_replace() function

param: string $search The value to search for
param: string $replace The replacement string
param: string $subject The target string
return: string The resultant string

Copyright:	(c) 2006 phpBB Group
License:	http://opensource.org/licenses/gpl-license.php GNU Public License
Version:	$Id$
File Size:	1995 lines (61 kb)
Included or required:	0 times
Referenced:	0 times
Includes or requires:	0 files

PHP Cross Reference of Unnamed Project

/includes/utf/ -> utf_tools.php (summary)

Defines 35 functions