[![Actions Status](https://github.com/kaz-utashiro/greple-subst/workflows/test/badge.svg)](https://github.com/kaz-utashiro/greple-subst/actions) [![MetaCPAN Release](https://badge.fury.io/pl/App-Greple-subst.svg)](https://metacpan.org/release/App-Greple-subst)
# NAME

subst - Greple module for text search and substitution

# VERSION

Version 2.36

# SYNOPSIS

greple -Msubst --dict _dictionary_ \[ options \]

    Dictionary:
      --dict      dictionary file
      --dictdata  dictionary data
      --dictpair  dictionary entry pair

    Check:
      --check=[ng,ok,any,outstand,all,none]
      --select=N
      --linefold
      --stat
      --with-stat
      --stat-style=[default,dict]
      --stat-item={match,expect,number,ok,ng,dict}=[0,1]
      --subst
      --[no-]warn-overlap
      --[no-]warn-include

    File Update:
      --diff
      --diffcmd command
      --create
      --replace
      --overwrite

# DESCRIPTION

This **greple** module supports check and substitution of text files
based on dictionary data.

Dictionary file is given by **--dict** option and each line contains
matching pattern and expected string pairs.

    greple -Msubst --dict DICT

If the dictionary file contains following data:

    colou?r      color
    cent(er|re)  center

above command finds the first pattern which does not match the second
string, that is "colour" and "centre" in this case.

In practice, the last two elements of a space-separated string are
treated as a pattern and a replacement string, respectively.

Dictionary data can also be written separated by `//` as follows:

    colou?r      //  color
    cent(er|re)  //  center

There must be spaces before and after the `//`.  In this format,
strings before and after it are treated as a pattern and replacement
string, rather than last two element.  Leading spaces and spaces
before and after `//` are ignored, but all other whitespace is valid.

You can use same file by **greple**'s **-f** option and string after
`//` is ignored as a comment in that case.

    greple -f DICT ...

Option **--dictdata** can be used to provide dictionary data in the
command line.

    greple -Msubst \
           --dictdata $'colou?r color\ncent(er|re) center\n'

Dictionary entry starting with a sharp sign (`#`) is a comment and
ignored.

Option **--dictpair** can be used to provide raw dictionary entries in
the command line.  In this case, no processing is done regarding
whitespace or comments.

    greple -Msubst \
           --dictpair 'colou?r' color \
           --dictpair 'cent(er|re)' center

## Overlapped pattern

When the matched string is same or shorter than previously matched
string by another pattern, it is simply ignored (**--no-warn-include**
by default).  So, if you have to declare conflicted patterns, place
the longer pattern earlier.

If the matched string overlaps with previously matched string, it is
warned (**--warn-overlap** by default) and ignored.

## Terminal color

This version uses [Getopt::EX::termcolor](https://metacpan.org/pod/Getopt%3A%3AEX%3A%3Atermcolor) module.  It sets option
**--light-screen** or **--dark-screen** depending on the terminal on
which the command run, or **TERM\_BGCOLOR** environment variable.

Some terminals (eg: "Apple\_Terminal" or "iTerm") are detected
automatically and no action is required.  Otherwise set
**TERM\_BGCOLOR** environment to #000000 (black) to #FFFFFF (white)
digit depending on terminal background color.

# OPTIONS

- **--dict**=_file_

    Specify dictionary file.

- **--dictdata**=_data_

    Specify dictionary data by text.

- **--dictpair** _pattern_ _replacement_

    Specify dictionary entry pair.  This option takes two parameters.  The
    first is a pattern and the second is a substitution string.

- **--check**=`outstand`|`ng`|`ok`|`any`|`all`|`none`

    Option **--check** takes argument from `ng`, `ok`, `any`,
    `outstand`, `all` and `none`.

    With default value `outstand`, command will show information about
    both expected and unexpected words only when unexpected word was found
    in the same file.

    With value `ng`, command will show information about unexpected
    words.  With value `ok`, you will get information about expected
    words.  Both with value `any`.

    Value `all` and `none` make sense only when used with **--stat**
    option, and display information about never matched pattern.

- **--select**=_N_

    Select _N_th entry from the dictionary.  Argument is interpreted by
    [Getopt::EX::Numbers](https://metacpan.org/pod/Getopt%3A%3AEX%3A%3ANumbers) module.  Range can be defined like
    **--select**=`1:3,7:9`.  You can get numbers by **--stat** option.

- **--linefold**

    If the target data is folded in the middle of text, use **--linefold**
    option.  It creates regex patterns which matches string spread across
    lines.  Substituted text does not include newline, though.  Because it
    confuses regex behavior somewhat, avoid to use if possible.

- **--stat**
- **--with-stat**

    Print statistical information.  Works with **--check** option.

    Option **--with-stat** print statistics after normal output, while
    **--stat** print only statistics.

- **--stat-style**=`default`|`dict`

    Using **--stat-style=dict** option with **--stat** and **--check=any**,
    you can get dictionary style output for your working document.

- **--stat-item** _item_=\[0,1\]

    Specify which item is shown up in stat information.  Default values
    are:

        match=1
        expect=1
        number=1
        ng=1
        ok=1
        dict=0

    If you don't need to see pattern field, use like this:

        --stat-item match=0

    Multiple parameters can be set at once:

        --stat-item match=number=0,ng=1,ok=1

- **--subst**

    Substitute unexpected matched pattern to expected string.  Newline
    character in the matched string is ignored.  Pattern without
    replacement string is not changed.

- **--\[no-\]warn-overlap**

    Warn overlapped pattern.
    Default on.

- **--\[no-\]warn-include**

    Warn included pattern.
    Default off.

## FILE UPDATE OPTIONS

- **--diff**
- **--diffcmd**=_command_

    Option **--diff** produce diff output of original and converted text.

    Specify diff command name used by **--diff** option.  Default is "diff
    \-u".

- **--create**

    Create new file and write the result.  Suffix ".new" is appended to
    original filename.

- **--replace**

    Replace the target file by converted result.  Original file is renamed
    to backup name with ".bak" suffix.

- **--overwrite**

    Overwrite the target file by converted result with no backup.

# DICTIONARY

This module includes example dictionaries.  They are installed share
directory and accessed by **--exdict** option.

    greple -Msubst --exdict jtca-katakana-guide-3.dict

- **--exdict** _dictionary_

    Use _dictionary_ flie in the distribution as a dictionary file.

- **--exdictdir**

    Show dictionary directory.

- **--exdict** jtca-katakana-guide-3.dict
- **--jtca-katakana-guide**

    Created from following guideline document.

        外来語(カタカナ)表記ガイドライン 第3版
        制定:2015年8月
        発行:2015年9月
        一般財団法人テクニカルコミュニケーター協会 
        Japan Technical Communicators Association
        https://www.jtca.org/standardization/katakana_guide_3_20171222.pdf

- **--jtca**

    Customized **--jtca-katakana-guide**.  Original dictionary is
    automatically generated from published data.  This dictionary is
    customized for practical use.

- **--exdict** jtf-style-guide-3.dict
- **--jtf-style-guide**

    Created from following guideline document.

        JTF日本語標準スタイルガイド(翻訳用)
        第3.0版
        2019年8月20日
        一般社団法人 日本翻訳連盟(JTF)
        翻訳品質委員会
        https://www.jtf.jp/jp/style_guide/pdf/jtf_style_guide.pdf

- **--jtf**

    Customized **--jtf-style-guide**.  Original dictionary is automatically
    generated from published data.  This dictionary is customized for
    practical use.

- **--exdict** sccc2.dict
- **--sccc2**

    Dictionary used for "C/C++ セキュアコーディング 第2版" published in
    2014.

        https://www.jpcert.or.jp/securecoding_book_2nd.html

- **--exdict** ms-style-guide.dict
- **--ms-style-guide**

    Dictionary generated from Microsoft localization style guide.

        https://www.microsoft.com/ja-jp/language/styleguides

    Data is generated from this article:

        https://www.atmarkit.co.jp/news/200807/25/microsoft.html

- **--microsoft**

    Customized **--ms-style-guide**.  Original dictionary is automatically
    generated from published data.  This dictionary is customized for
    practical use.

    Amendment dictionary can be found
    [here](https://github.com/kaz-utashiro/greple-subst/blob/master/share/ms-amend.dict).
    Please raise an issue or send a pull-request if you have request to update.

# JAPANESE

This module is originaly made for Japanese text editing support.

## KATAKANA

Japanese KATAKANA word have a lot of variants to describe same word,
so unification is important but it's quite tiresome work.  In the next
example,

    イ[エー]ハトー?([ヴブボ]ォ?)  //  イーハトーヴォ

left pattern matches all following words.

    イエハトブ
    イーハトヴ
    イーハトーヴ
    イーハトーヴォ
    イーハトーボ
    イーハトーブ

This module helps to detect and correct them.

# INSTALL

## CPANMINUS

    $ cpanm App::Greple::subst

# SEE ALSO

[https://github.com/kaz-utashiro/greple](https://github.com/kaz-utashiro/greple)

[https://github.com/kaz-utashiro/greple-subst](https://github.com/kaz-utashiro/greple-subst)

[https://github.com/kaz-utashiro/greple-update](https://github.com/kaz-utashiro/greple-update)

[https://www.jtca.org/standardization/katakana\_guide\_3\_20171222.pdf](https://www.jtca.org/standardization/katakana_guide_3_20171222.pdf)

[https://www.jtf.jp/jp/style\_guide/styleguide\_top.html](https://www.jtf.jp/jp/style_guide/styleguide_top.html),
[https://www.jtf.jp/jp/style\_guide/pdf/jtf\_style\_guide.pdf](https://www.jtf.jp/jp/style_guide/pdf/jtf_style_guide.pdf)

[https://www.microsoft.com/ja-jp/language/styleguides](https://www.microsoft.com/ja-jp/language/styleguides),
[https://www.atmarkit.co.jp/news/200807/25/microsoft.html](https://www.atmarkit.co.jp/news/200807/25/microsoft.html)

[文化庁 国語施策・日本語教育 国語施策情報 内閣告示・内閣訓令 外来語の表記](https://www.bunka.go.jp/kokugo_nihongo/sisaku/joho/joho/kijun/naikaku/gairai/index.html)

[https://qiita.com/kaz-utashiro/items/85add653a71a7e01c415](https://qiita.com/kaz-utashiro/items/85add653a71a7e01c415)

[イーハトーブ](https://ja.wikipedia.org/wiki/%E3%82%A4%E3%83%BC%E3%83%8F%E3%83%88%E3%83%BC%E3%83%96)

# AUTHOR

Kazumasa Utashiro

# LICENSE

Copyright 2017-2024 Kazumasa Utashiro.

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.