HEX
Server: nginx/1.24.0
System: Linux localhost 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4 18:03:25 UTC 2022 x86_64
User: www (1000)
PHP: 8.3.27
Disabled: passthru,exec,system,putenv,chroot,chgrp,chown,shell_exec,popen,proc_open,pcntl_exec,ini_alter,ini_restore,dl,openlog,syslog,readlink,symlink,popepassthru,pcntl_alarm,pcntl_fork,pcntl_waitpid,pcntl_wait,pcntl_wifexited,pcntl_wifstopped,pcntl_wifsignaled,pcntl_wifcontinued,pcntl_wexitstatus,pcntl_wtermsig,pcntl_wstopsig,pcntl_signal,pcntl_signal_dispatch,pcntl_get_last_error,pcntl_strerror,pcntl_sigprocmask,pcntl_sigwaitinfo,pcntl_sigtimedwait,pcntl_exec,pcntl_getpriority,pcntl_setpriority,imap_open,apache_setenv
Upload Files
File: //www/server/mysql/share/groonga-normalizer-mysql/README.md
# README

## Name

groonga-normalizer-mysql

## Description

Groonga-normalizer-mysql is a Groonga plugin. It provides MySQL
compatible normalizers and a custom normalizers to Groonga.

Here are MySQL compatible normalizers:

* `NormalizerMySQLGeneralCI` for `utf8mb4_general_ci`
* `NormalizerMySQLUnicodeCI` for `utf8mb4_unicode_ci`
* `NormalizerMySQLUnicode520CI` for `utf8mb4_unicode_520_ci`

Here are custom normalizers:

* `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`
   * It's based on `NormalizerMySQLUnicodeCI`
* `NormalizerMySQLUnicode520CIExceptKanaCIKanaWithVoicedSoundMark`
   * It's based on `NormalizerMySQLUnicode520CI`

They are self-descriptive name but long. They are variant normalizers
of `NormalizerMySQLUnicodeCI` and `NormalizerMySQLUnicode520CI`. They
have different behaviors. The followings are the different
behaviors. They describes with
`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` but they
are true for
`NormalizerMySQLUnicode520CIExceptKanaCIKanaWithVoicedSoundMark`.

* `NormalizerMySQLUnicodeCI` normalizes all small Hiragana such as `ぁ`,
  `っ` to Hiragana such as `あ`, `つ`.
  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`
  doesn't normalize `ぁ` to `あ` nor `っ` to `つ`. `ぁ` and `あ` are
  different characters. `っ` and `つ` are also different characters.
  This behavior is described by `ExceptKanaCI` in the long name.  This
  following behaviors ared described by
  `ExceptKanaWithVoicedSoundMark` in the long name.
* `NormalizerMySQLUnicode` normalizes all Hiragana with voiced sound
  mark such as `が` to Hiragana without voiced sound mark such as `か`.
  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
  normalize `が` to `か`. `が` and `か` are different characters.
* `NormalizerMySQLUnicode` normalizes all Hiragana with semi-voiced sound
  mark such as `ぱ` to Hiragana without semi-voiced sound mark such as `は`.
  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
  normalize `ぱ` to `は`. `ぱ` and `は` are different characters.
* `NormalizerMySQLUnicode` normalizes all Katakana with voiced sound
  mark such as `ガ` to Katakana without voiced sound mark such as `カ`.
  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
  normalize `ガ` to `カ`. `ガ` and `カ` are different characters.
* `NormalizerMySQLUnicode` normalizes all Katakana with semi-voiced sound
  mark such as `パ` to Hiragana without semi-voiced sound mark such as `ハ`.
  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
  normalize `パ` to `ハ`. `パ` and `ハ` are different characters.
* `NormalizerMySQLUnicode` normalizes all halfwidth Katakana with
  voiced sound mark such as `ガ` to halfwidth Katakana without voiced
  sound mark such as `カ`.
  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`
  normalizes all halfwidth Katakana with voided sound mark such as `ガ`
  to fullwidth Katakana with voiced sound mark such as `ガ`.

`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` and
`NormalizerMySQLUnicode520CIExceptKanaCIKanaWithVoicedSoundMark` and
are MySQL incompatible normalizers but they are useful for Japanese
text. For example, `ふらつく` and `ブラック` has different
means. `NormalizerMySQLUnicodeCI` identifies `ふらつく` with `ブラック`
but `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`
doesn't identify them.

## Install

### Debian GNU/Linux

[Add apt-line for the Groonga deb package repository](http://groonga.org/docs/install/debian.html)
and install `groonga-normalizer-mysql` package:

    % sudo apt-get -y install groonga-normalizer-mysql

### Ubuntu

[Add apt-line for the Groonga deb package repository](http://groonga.org/docs/install/ubuntu.html)
and install `groonga-normalizer-mysql` package:

    % sudo apt-get -y install groonga-normalizer-mysql

### CentOS

Install `groonga-repository` package:

    % sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
    % sudo yum makecache

Then install `groonga-normalizer-mysql` package:

    % sudo yum install -y groonga-normalizer-mysql

### Fedora

Install `groonga-repository` package:

    % sudo rpm -ivh http://packages.groonga.org/fedora/groonga-release-1.1.0-1.noarch.rpm
    % sudo yum makecache

Then install `groonga-normalizer-mysql` package:

    % sudo yum install -y groonga-normalizer-mysql

### OS X - Homebrew

Install `groonga-normalizer-mysql` package:

    % brew install groonga-normalizer-mysql

### Windows

You need to build from source. Here are build instructions.

#### Build system

Install the following build tools:

* [Microsoft Visual Studio 2010 Express](http://www.microsoft.com/japan/msdn/vstudio/express/): 2012 isn't tested yet.
* [CMake](http://www.cmake.org/)

#### Build Groonga

Download the latest Groonga source from [packages.groonga.org](http://packages.groonga.org/source/groonga/). Source file name is formatted as `groonga-X.Y.Z.zip`.

Extract the source and move to the source folder:

    > cd ...\groonga-X.Y.Z
    groonga-X.Y.Z>

Run CMake. Here is a command line to install Groonga to `C:\groonga` folder:

    groonga-X.Y.Z> cmake . -G "Visual Studio 12 Win64" -DCMAKE_INSTALL_PREFIX=C:\groonga

Build:

    groonga-X.Y.Z> cmake --build . --config Release

Install:

    groonga-X.Y.Z> cmake --build . --config Release --target Install

#### Build groonga-normalizer-mysql

Download the latest groonga-normalizer-mysql source from [packages.groonga.org](http://packages.groonga.org/source/groonga-normalizer-mysql/). Source file name is formatted as `groonga-normalizer-X.Y.Z.zip`.

Extract the source and move to the source folder:

    > cd ...\groonga-normalizer-mysql-X.Y.Z
    groonga-normalizer-mysql-X.Y.Z>

IMPORTANT!!!: Set `PKG_CONFIG_PATH` environment variable:

    groonga-normalizer-mysql-X.Y.Z> set PKG_CONFIG_PATH=C:\groongalocal\lib\pkgconfig

Run CMake. Here is a command line to install Groonga to `C:\groonga` folder:

    groonga-normalizer-mysql-X.Y.Z> cmake . -G "Visual Studio 12 Win64" -DCMAKE_INSTALL_PREFIX=C:\groonga

Build:

    groonga-normalizer-mysql-X.Y.Z> cmake --build . --config Release

Install:

    groonga-normalizer-mysql-X.Y.Z> cmake --build . --config Release --target Install

## Usage

First, you need to register `normalizers/mysql` plugin:

    groonga> register normalizers/mysql

Then, you can use `NormalizerMySQLGeneralCI` and
`NormalizerMySQLUnicodeCI` as normalizers:

    groonga> table_create Lexicon TABLE_PAT_KEY --default_tokenizer TokenBigram --normalizer NormalizerMySQLGeneralCI

## Dependencies

* Groonga >= 3.0.3

## Mailing list

* English: [groonga-talk@lists.sourceforge.net](https://lists.sourceforge.net/lists/listinfo/groonga-talk)
* Japanese: [groonga-dev@lists.sourceforge.jp](http://lists.sourceforge.jp/mailman/listinfo/groonga-dev)

## Thanks

* Alexander Barkov \<bar@udm.net\>: The author of
  `MYSQL_SOURCE/strings/ctype-utf8.c`.
* ...

## Authors

* Kouhei Sutou \<kou@clear-code.com\>

## License

LGPLv2 only. See doc/text/lgpl-2.0.txt for details.

This program uses normalization table defined in MySQL source code. So
this program is derived work of
`MYSQL_SOURCE/strings/ctype-utf8.c`. This program is the same license
as `MYSQL_SOURCE/strings/ctype-utf8.c` and it is licensed under LGPLv2
only.