[ Index ]

PHP Cross Reference of Unnamed Project

title

Body

[close]

/includes/utf/ -> utf_tools.php (summary)

(no description)

Copyright: (c) 2006 phpBB Group
License: http://opensource.org/licenses/gpl-license.php GNU Public License
Version: $Id$
File Size: 1995 lines (61 kb)
Included or required:0 times
Referenced: 0 times
Includes or requires: 0 files

Defines 35 functions

  utf8_encode()
  utf8_decode()
  utf8_strrpos()
  utf8_strrpos()
  utf8_strpos()
  utf8_strtolower()
  utf8_strtoupper()
  utf8_substr()
  utf8_strlen()
  utf8_strrpos()
  utf8_strpos()
  utf8_strtolower()
  utf8_strtoupper()
  utf8_substr()
  utf8_strlen()
  utf8_str_split()
  utf8_strspn()
  utf8_ucfirst()
  utf8_recode()
  utf8_encode_ncr()
  utf8_encode_ncr_callback()
  utf8_ord()
  utf8_chr()
  utf8_decode_ncr()
  utf8_decode_ncr_callback()
  utf8_case_fold()
  utf8_case_fold_nfkc()
  utf8_case_fold_nfc()
  utf8_normalize_nfc()
  utf8_clean_string()
  utf8_htmlspecialchars()
  utf8_convert_message()
  utf8_wordwrap()
  utf8_basename()
  utf8_str_replace()

Functions
Functions that are not part of a class:

utf8_encode($str)   X-Ref
Implementation of PHP's native utf8_encode for people without XML support
This function exploits some nice things that ISO-8859-1 and UTF-8 have in common

param: string $str ISO-8859-1 encoded data
return: string UTF-8 encoded data

utf8_decode($str)   X-Ref
Implementation of PHP's native utf8_decode for people without XML support

param: string $str UTF-8 encoded data
return: string ISO-8859-1 encoded data

utf8_strrpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strrpos


utf8_strrpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strrpos


utf8_strpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strpos


utf8_strtolower($str)   X-Ref
UTF-8 aware alternative to strtolower


utf8_strtoupper($str)   X-Ref
UTF-8 aware alternative to strtoupper


utf8_substr($str, $offset, $length = null)   X-Ref
UTF-8 aware alternative to substr


utf8_strlen($text)   X-Ref
Return the length (in characters) of a UTF-8 string


utf8_strrpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strrpos
Find position of last occurrence of a char in a string

author: Harry Fuecks
param: string $str haystack
param: string $needle needle
param: integer $offset (optional) offset (from left)
return: mixed integer position or FALSE on failure

utf8_strpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strpos
Find position of first occurrence of a string

author: Harry Fuecks
param: string $str haystack
param: string $needle needle
param: integer $offset offset in characters (from left)
return: mixed integer position or FALSE on failure

utf8_strtolower($string)   X-Ref
UTF-8 aware alternative to strtolower
Make a string lowercase
Note: The concept of a characters "case" only exists is some alphabets
such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
not exist in the Chinese alphabet, for example. See Unicode Standard
Annex #21: Case Mappings

param: string
return: string string in lowercase

utf8_strtoupper($string)   X-Ref
UTF-8 aware alternative to strtoupper
Make a string uppercase
Note: The concept of a characters "case" only exists is some alphabets
such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
not exist in the Chinese alphabet, for example. See Unicode Standard
Annex #21: Case Mappings

param: string
return: string string in uppercase

utf8_substr($str, $offset, $length = NULL)   X-Ref
UTF-8 aware alternative to substr
Return part of a string given character offset (and optionally length)

Note arguments: comparied to substr - if offset or length are
not integers, this version will not complain but rather massages them
into an integer.

Note on returned values: substr documentation states false can be
returned in some cases (e.g. offset > string length)
mb_substr never returns false, it will return an empty string instead.
This adopts the mb_substr approach

Note on implementation: PCRE only supports repetitions of less than
65536, in order to accept up to MAXINT values for offset and length,
we'll repeat a group of 65535 characters when needed.

Note on implementation: calculating the number of characters in the
string is a relatively expensive operation, so we only carry it out when
necessary. It isn't necessary for +ve offsets and no specified length

author: Chris Smith<chris@jalakai.co.uk>
param: string $str
param: integer $offset number of UTF-8 characters offset (from left)
param: integer $length (optional) length in UTF-8 characters from offset
return: mixed string or FALSE if failure

utf8_strlen($text)   X-Ref
Return the length (in characters) of a UTF-8 string

param: string    $text        UTF-8 string
return: integer                Length (in chars) of given string

utf8_str_split($str, $split_len = 1)   X-Ref
UTF-8 aware alternative to str_split
Convert a string to an array

author: Harry Fuecks
param: string $str UTF-8 encoded
param: int $split_len number to characters to split string by
return: array characters in string reverses

utf8_strspn($str, $mask, $start = null, $length = null)   X-Ref
UTF-8 aware alternative to strspn
Find length of initial segment matching the mask

author: Harry Fuecks

utf8_ucfirst($str)   X-Ref
UTF-8 aware alternative to ucfirst
Make a string's first character uppercase

author: Harry Fuecks
param: string
return: string with first character as upper case (if applicable)

utf8_recode($string, $encoding)   X-Ref
Recode a string to UTF-8

If the encoding is not supported, the string is returned as-is

param: string    $string        Original string
param: string    $encoding    Original encoding (lowered)
return: string                The string, encoded in UTF-8

utf8_encode_ncr($text)   X-Ref
Replace all UTF-8 chars that are not in ASCII with their NCR

param: string    $text        UTF-8 string in NFC
return: string                ASCII string using NCRs for non-ASCII chars

utf8_encode_ncr_callback($m)   X-Ref
Callback used in encode_ncr()

Takes a UTF-8 char and replaces it with its NCR. Attention, $m is an array

param: array    $m            0-based numerically indexed array passed by preg_replace_callback()
return: string                A HTML NCR if the character is valid, or the original string otherwise

utf8_ord($chr)   X-Ref
Converts a UTF-8 char to an NCR

param: string $chr UTF-8 char
return: integer UNICODE code point

utf8_chr($cp)   X-Ref
Converts an NCR to a UTF-8 char

param: int        $cp    UNICODE code point
return: string        UTF-8 char

utf8_decode_ncr($text)   X-Ref
Convert Numeric Character References to UTF-8 chars

Notes:
- we do not convert NCRs recursively, if you pass &#38;#38; it will return &#38;
- we DO NOT check for the existence of the Unicode characters, therefore an entity may be converted to an inexistent codepoint

param: string    $text        String to convert, encoded in UTF-8 (no normal form required)
return: string                UTF-8 string where NCRs have been replaced with the actual chars

utf8_decode_ncr_callback($m)   X-Ref
Callback used in decode_ncr()

Takes a NCR (in decimal or hexadecimal) and returns a UTF-8 char. Attention, $m is an array.
It will ignore most of invalid NCRs, but not all!

param: array    $m            0-based numerically indexed array passed by preg_replace_callback()
return: string                UTF-8 char

utf8_case_fold($text, $option = 'full')   X-Ref
Case folds a unicode string as per Unicode 5.0, section 3.13

param: string    $text    text to be case folded
param: string    $option    determines how we will fold the cases
return: string            case folded text

utf8_case_fold_nfkc($text, $option = 'full')   X-Ref
Takes the input and does a "special" case fold. It does minor normalization
and returns NFKC compatable text

param: string    $text    text to be case folded
param: string    $option    determines how we will fold the cases
return: string            case folded text

utf8_case_fold_nfc($text, $option = 'full')   X-Ref
Assume the input is NFC:
Takes the input and does a "special" case fold. It does minor normalization as well.

param: string    $text    text to be case folded
param: string    $option    determines how we will fold the cases
return: string            case folded text

utf8_normalize_nfc($strings)   X-Ref
A wrapper function for the normalizer which takes care of including the class if required and modifies the passed strings
to be in NFC (Normalization Form Composition).

param: mixed    $strings    a string or an array of strings to normalize
return: mixed                the normalized content, preserving array keys if array given.

utf8_clean_string($text)   X-Ref
This function is used to generate a "clean" version of a string.
Clean means that it is a case insensitive form (case folding) and that it is normalized (NFC).
Additionally a homographs of one character are transformed into one specific character (preferably ASCII
if it is an ASCII character).

Please be aware that if you change something within this function or within
functions used here you need to rebuild/update the username_clean column in the users table. And all other
columns that store a clean string otherwise you will break this functionality.

param: string    $text    An unclean string, mabye user input (has to be valid UTF-8!)
return: string            Cleaned up version of the input string

utf8_htmlspecialchars($value)   X-Ref
A wrapper for htmlspecialchars($value, ENT_COMPAT, 'UTF-8')


utf8_convert_message($message)   X-Ref
Trying to convert returned system message to utf8

PHP assumes such messages are ISO-8859-1 so we'll do that too
and if it breaks messages we'll blame it on them ;-)

utf8_wordwrap($string, $width = 75, $break = "\n", $cut = false)   X-Ref
UTF8-compatible wordwrap replacement

param: string    $string    The input string
param: int        $width    The column width. Defaults to 75.
param: string    $break    The line is broken using the optional break parameter. Defaults to '\n'.
param: bool    $cut    If the cut is set to TRUE, the string is always wrapped at the specified width. So if you have a word that is larger than the given width, it is broken apart.
return: string            the given string wrapped at the specified column.

utf8_basename($filename)   X-Ref
UTF8-safe basename() function

basename() has some limitations and is dependent on the locale setting
according to the PHP manual. Therefore we provide our own locale independant
basename function.

param: string $filename The filename basename() should be applied to
return: string The basenamed filename

utf8_str_replace($search, $replace, $subject)   X-Ref
UTF8-safe str_replace() function

param: string $search The value to search for
param: string $replace The replacement string
param: string $subject The target string
return: string The resultant string



Generated: Wed Oct 2 15:03:47 2013 Cross-referenced by PHPXref 0.7.1