【Python】CaboCha のツリーを XML から JSON に変換する

CaboCha のツリーを扱いたいのですがデフォルトでは JSON でのアウトプットがない様なので、xmltodict を利用して XML 形式から JSON 形式に変換します。

XML での表出

まず、XML の表出は下記の様になります。

import CaboCha

c = CaboCha.Parser()

tree = c.parse('今日は天気がとても良いですね。')
xmltree = tree.toString(CaboCha.FORMAT_XML)
print(xmltree)

XML アウトプット

<sentence>
 <chunk id="0" link="3" rel="D" score="-1.359140" head="0" func="1">
  <tok id="0" feature="名詞,副詞可能,*,*,*,*,今日,キョウ,キョー">今日</tok>
  <tok id="1" feature="助詞,係助詞,*,*,*,*,は,ハ,ワ">は</tok>
 </chunk>
 <chunk id="1" link="3" rel="D" score="-1.359140" head="2" func="3">
  <tok id="2" feature="名詞,一般,*,*,*,*,天気,テンキ,テンキ">天気</tok>
  <tok id="3" feature="助詞,格助詞,一般,*,*,*,が,ガ,ガ">が</tok>
 </chunk>
 <chunk id="2" link="3" rel="D" score="-1.359140" head="4" func="4">
  <tok id="4" feature="副詞,助詞類接続,*,*,*,*,とても,トテモ,トテモ">とても</tok>
 </chunk>
 <chunk id="3" link="-1" rel="D" score="0.000000" head="5" func="7">
  <tok id="5" feature="形容詞,自立,*,*,形容詞・アウオ段,基本形,良い,ヨイ,ヨイ">良い</tok>
  <tok id="6" feature="助動詞,*,*,*,特殊・デス,基本形,です,デス,デス">です</tok>
  <tok id="7" feature="助詞,終助詞,*,*,*,*,ね,ネ,ネ">ね</tok>
  <tok id="8" feature="記号,句点,*,*,*,*,。,。,。">。</tok>
 </chunk>
</sentence>

JSON での表出

xmltodict を使うので、インストールしていない場合はコマンド「pip install xmltodict」でインストールしてください。

import CaboCha
import xmltodict
import json

c = CaboCha.Parser()

tree = c.parse('今日は天気がとても良いですね。')
xmltree = tree.toString(CaboCha.FORMAT_XML)
jsonobj = xmltodict.parse(xmltree, attr_prefix='', cdata_key='surface', dict_constructor=dict)
print(json.dumps(jsonobj, indent=2, ensure_ascii=False))

JSON アウトプット

{
  "sentence": {
    "chunk": [
      {
        "id": "0",
        "link": "3",
        "rel": "D",
        "score": "-1.359140",
        "head": "0",
        "func": "1",
        "tok": [
          {
            "id": "0",
            "feature": "名詞,副詞可能,*,*,*,*,今日,キョウ,キョー",
            "surface": "今日"
          },
          {
            "id": "1",
            "feature": "助詞,係助詞,*,*,*,*,は,ハ,ワ",
            "surface": "は"
          }
        ]
      },
      {
        "id": "1",
        "link": "3",
        "rel": "D",
        "score": "-1.359140",
        "head": "2",
        "func": "3",
        "tok": [
          {
            "id": "2",
            "feature": "名詞,一般,*,*,*,*,天気,テンキ,テンキ",
            "surface": "天気"
          },
          {
            "id": "3",
            "feature": "助詞,格助詞,一般,*,*,*,が,ガ,ガ",
            "surface": "が"
          }
        ]
      },
      {
        "id": "2",
        "link": "3",
        "rel": "D",
        "score": "-1.359140",
        "head": "4",
        "func": "4",
        "tok": {
          "id": "4",
          "feature": "副詞,助詞類接続,*,*,*,*,とても,トテモ,トテモ",
          "surface": "とても"
        }
      },
      {
        "id": "3",
        "link": "-1",
        "rel": "D",
        "score": "0.000000",
        "head": "5",
        "func": "7",
        "tok": [
          {
            "id": "5",
            "feature": "形容詞,自立,*,*,形容詞・アウオ段,基本形,良い,ヨイ,ヨイ",
            "surface": "良い"
          },
          {
            "id": "6",
            "feature": "助動詞,*,*,*,特殊・デス,基本形,です,デス,デス",
            "surface": "です"
          },
          {
            "id": "7",
            "feature": "助詞,終助詞,*,*,*,*,ね,ネ,ネ",
            "surface": "ね"
          },
          {
            "id": "8",
            "feature": "記号,句点,*,*,*,*,。,。,。",
            "surface": "。"
          }
        ]
      }
    ]
  }
}

さらに改良

上記でも JSON 形式で返ってきますが、chunk や tok 要素の中身が 1 つしかない時にリスト形式になっていない、feature がカンマ区切りの文字列(リスト形式でない)になっているなど少し不便です。

下記の様に処理を追加するとフォーマットを揃えることができます。

import CaboCha
import xmltodict
import json

c = CaboCha.Parser()

tree = c.parse('今日は天気がとても良いですね。')
xmltree = tree.toString(CaboCha.FORMAT_XML)
jsonobj = xmltodict.parse(xmltree, attr_prefix='', cdata_key='surface', dict_constructor=dict)

# 追記分 ↓
if jsonobj['sentence']: # sentence が存在する際に処理を行う
    if type(jsonobj['sentence']['chunk']) is not list: # chunk を必ずリスト形式にする
        jsonobj['sentence']['chunk'] = [jsonobj['sentence']['chunk']]
    
    for chunk in jsonobj['sentence']['chunk']:
        if type(chunk['tok']) is not list: # tok を必ずリスト形式にする
            chunk['tok'] = [chunk['tok']]
        
        for tok in chunk['tok']:
            feature_list = tok['feature'].split(',') # feature をリスト形式に変換
            tok['feature'] = feature_list
# 追記分 ↑

print(json.dumps(jsonobj, indent=2, ensure_ascii=False))

JSON アウトプット ver 2

{
  "sentence": {
    "chunk": [
      {
        "id": "0",
        "link": "3",
        "rel": "D",
        "score": "-1.359140",
        "head": "0",
        "func": "1",
        "tok": [
          {
            "id": "0",
            "feature": [
              "名詞",
              "副詞可能",
              "*",
              "*",
              "*",
              "*",
              "今日",
              "キョウ",
              "キョー"
            ],
            "surface": "今日"
          },
          {
            "id": "1",
            "feature": [
              "助詞",
              "係助詞",
              "*",
              "*",
              "*",
              "*",
              "は",
              "ハ",
              "ワ"
            ],
            "surface": "は"
          }
        ]
      },
      {
        "id": "1",
        "link": "3",
        "rel": "D",
        "score": "-1.359140",
        "head": "2",
        "func": "3",
        "tok": [
          {
            "id": "2",
            "feature": [
              "名詞",
              "一般",
              "*",
              "*",
              "*",
              "*",
              "天気",
              "テンキ",
              "テンキ"
            ],
            "surface": "天気"
          },
          {
            "id": "3",
            "feature": [
              "助詞",
              "格助詞",
              "一般",
              "*",
              "*",
              "*",
              "が",
              "ガ",
              "ガ"
            ],
            "surface": "が"
          }
        ]
      },
      {
        "id": "2",
        "link": "3",
        "rel": "D",
        "score": "-1.359140",
        "head": "4",
        "func": "4",
        "tok": [
          {
            "id": "4",
            "feature": [
              "副詞",
              "助詞類接続",
              "*",
              "*",
              "*",
              "*",
              "とても",
              "トテモ",
              "トテモ"
            ],
            "surface": "とても"
          }
        ]
      },
      {
        "id": "3",
        "link": "-1",
        "rel": "D",
        "score": "0.000000",
        "head": "5",
        "func": "7",
        "tok": [
          {
            "id": "5",
            "feature": [
              "形容詞",
              "自立",
              "*",
              "*",
              "形容詞・アウオ段",
              "基本形",
              "良い",
              "ヨイ",
              "ヨイ"
            ],
            "surface": "良い"
          },
          {
            "id": "6",
            "feature": [
              "助動詞",
              "*",
              "*",
              "*",
              "特殊・デス",
              "基本形",
              "です",
              "デス",
              "デス"
            ],
            "surface": "です"
          },
          {
            "id": "7",
            "feature": [
              "助詞",
              "終助詞",
              "*",
              "*",
              "*",
              "*",
              "ね",
              "ネ",
              "ネ"
            ],
            "surface": "ね"
          },
          {
            "id": "8",
            "feature": [
              "記号",
              "句点",
              "*",
              "*",
              "*",
              "*",
              "。",
              "。",
              "。"
            ],
            "surface": "。"
          }
        ]
      }
    ]
  }
}

【ログ】macOS Big Sur 11.2.2: ./bin/install-mecab-ipadic-neologd -n -a

  • 実行コマンド:./bin/install-mecab-ipadic-neologd -n -a
  • 実行日:2021/05/30
  • 実行環境:macOS Big Sur 11.2.2
% ./bin/install-mecab-ipadic-neologd -n -a
[install-mecab-ipadic-NEologd] : Start..
[install-mecab-ipadic-NEologd] : Check the existance of libraries
[install-mecab-ipadic-NEologd] :     find => ok
[install-mecab-ipadic-NEologd] :     sort => ok
[install-mecab-ipadic-NEologd] :     head => ok
[install-mecab-ipadic-NEologd] :     cut => ok
[install-mecab-ipadic-NEologd] :     egrep => ok
[install-mecab-ipadic-NEologd] :     mecab => ok
[install-mecab-ipadic-NEologd] :     mecab-config => ok
[install-mecab-ipadic-NEologd] :     make => ok
[install-mecab-ipadic-NEologd] :     curl => ok
[install-mecab-ipadic-NEologd] :     sed => ok
[install-mecab-ipadic-NEologd] :     cat => ok
[install-mecab-ipadic-NEologd] :     diff => ok
[install-mecab-ipadic-NEologd] :     tar => ok
[install-mecab-ipadic-NEologd] :     unxz => ok
[install-mecab-ipadic-NEologd] :     xargs => ok
[install-mecab-ipadic-NEologd] :     grep => ok
[install-mecab-ipadic-NEologd] :     iconv => ok
[install-mecab-ipadic-NEologd] :     patch => ok
[install-mecab-ipadic-NEologd] :     which => ok
[install-mecab-ipadic-NEologd] :     file => ok
[install-mecab-ipadic-NEologd] :     openssl => ok
[install-mecab-ipadic-NEologd] :     awk => ok

[install-mecab-ipadic-NEologd] : mecab-ipadic-NEologd is already up-to-date

[install-mecab-ipadic-NEologd] : mecab-ipadic-NEologd will be install to /usr/local/lib/mecab/dic/mecab-ipadic-neologd

[install-mecab-ipadic-NEologd] : Make mecab-ipadic-NEologd
[make-mecab-ipadic-NEologd] : Start..
[make-mecab-ipadic-NEologd] : Check local seed directory
[make-mecab-ipadic-NEologd] : Check local seed file
[make-mecab-ipadic-NEologd] : Check local build directory
[make-mecab-ipadic-NEologd] : create /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../build
[make-mecab-ipadic-NEologd] : Download original mecab-ipadic file
[make-mecab-ipadic-NEologd] : Try to access to https://ja.osdn.net
[make-mecab-ipadic-NEologd] : Try to download from https://ja.osdn.net/frs/g_redir.php?m=kent&f=mecab%2Fmecab-ipadic%2F2.7.0-20070801%2Fmecab-ipadic-2.7.0-20070801.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 11.6M  100 11.6M    0     0  4006k      0  0:00:02  0:00:02 --:--:-- 5855k
Hash value of /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0-20070801.tar.gz matched
[make-mecab-ipadic-NEologd] : Decompress original mecab-ipadic file
x mecab-ipadic-2.7.0-20070801/
x mecab-ipadic-2.7.0-20070801/README
x mecab-ipadic-2.7.0-20070801/AUTHORS
x mecab-ipadic-2.7.0-20070801/COPYING
x mecab-ipadic-2.7.0-20070801/ChangeLog
x mecab-ipadic-2.7.0-20070801/INSTALL
x mecab-ipadic-2.7.0-20070801/Makefile.am
x mecab-ipadic-2.7.0-20070801/Makefile.in
x mecab-ipadic-2.7.0-20070801/NEWS
x mecab-ipadic-2.7.0-20070801/aclocal.m4
x mecab-ipadic-2.7.0-20070801/config.guess
x mecab-ipadic-2.7.0-20070801/config.sub
x mecab-ipadic-2.7.0-20070801/configure
x mecab-ipadic-2.7.0-20070801/configure.in
x mecab-ipadic-2.7.0-20070801/install-sh
x mecab-ipadic-2.7.0-20070801/missing
x mecab-ipadic-2.7.0-20070801/mkinstalldirs
x mecab-ipadic-2.7.0-20070801/Adj.csv
x mecab-ipadic-2.7.0-20070801/Adnominal.csv
x mecab-ipadic-2.7.0-20070801/Adverb.csv
x mecab-ipadic-2.7.0-20070801/Auxil.csv
x mecab-ipadic-2.7.0-20070801/Conjunction.csv
x mecab-ipadic-2.7.0-20070801/Filler.csv
x mecab-ipadic-2.7.0-20070801/Interjection.csv
x mecab-ipadic-2.7.0-20070801/Noun.adjv.csv
x mecab-ipadic-2.7.0-20070801/Noun.adverbal.csv
x mecab-ipadic-2.7.0-20070801/Noun.csv
x mecab-ipadic-2.7.0-20070801/Noun.demonst.csv
x mecab-ipadic-2.7.0-20070801/Noun.nai.csv
x mecab-ipadic-2.7.0-20070801/Noun.name.csv
x mecab-ipadic-2.7.0-20070801/Noun.number.csv
x mecab-ipadic-2.7.0-20070801/Noun.org.csv
x mecab-ipadic-2.7.0-20070801/Noun.others.csv
x mecab-ipadic-2.7.0-20070801/Noun.place.csv
x mecab-ipadic-2.7.0-20070801/Noun.proper.csv
x mecab-ipadic-2.7.0-20070801/Noun.verbal.csv
x mecab-ipadic-2.7.0-20070801/Others.csv
x mecab-ipadic-2.7.0-20070801/Postp-col.csv
x mecab-ipadic-2.7.0-20070801/Postp.csv
x mecab-ipadic-2.7.0-20070801/Prefix.csv
x mecab-ipadic-2.7.0-20070801/Suffix.csv
x mecab-ipadic-2.7.0-20070801/Symbol.csv
x mecab-ipadic-2.7.0-20070801/Verb.csv
x mecab-ipadic-2.7.0-20070801/char.def
x mecab-ipadic-2.7.0-20070801/feature.def
x mecab-ipadic-2.7.0-20070801/left-id.def
x mecab-ipadic-2.7.0-20070801/matrix.def
x mecab-ipadic-2.7.0-20070801/pos-id.def
x mecab-ipadic-2.7.0-20070801/rewrite.def
x mecab-ipadic-2.7.0-20070801/right-id.def
x mecab-ipadic-2.7.0-20070801/unk.def
x mecab-ipadic-2.7.0-20070801/dicrc
x mecab-ipadic-2.7.0-20070801/RESULT
[make-mecab-ipadic-NEologd] : Configure custom system dictionary on /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0-20070801-neologd-20200910
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets $(MAKE)... yes
checking for working aclocal-1.4... missing
checking for working autoconf... found
checking for working automake-1.4... missing
checking for working autoheader... found
checking for working makeinfo... found
checking for a BSD-compatible install... /usr/bin/install -c
checking for mecab-config... /usr/local/bin/mecab-config
configure: creating ./config.status
config.status: creating Makefile
[make-mecab-ipadic-NEologd] : Encode the character encoding of system dictionary resources from EUC_JP to UTF-8
./../../libexec/iconv_euc_to_utf8.sh ./Noun.place.csv
./../../libexec/iconv_euc_to_utf8.sh ./Auxil.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.verbal.csv
./../../libexec/iconv_euc_to_utf8.sh ./Symbol.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.org.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.csv
./../../libexec/iconv_euc_to_utf8.sh ./Postp.csv
./../../libexec/iconv_euc_to_utf8.sh ./Adj.csv
./../../libexec/iconv_euc_to_utf8.sh ./Filler.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.proper.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.number.csv
./../../libexec/iconv_euc_to_utf8.sh ./Suffix.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.others.csv
./../../libexec/iconv_euc_to_utf8.sh ./Interjection.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.adjv.csv
./../../libexec/iconv_euc_to_utf8.sh ./Verb.csv
./../../libexec/iconv_euc_to_utf8.sh ./Others.csv
./../../libexec/iconv_euc_to_utf8.sh ./Adnominal.csv
./../../libexec/iconv_euc_to_utf8.sh ./Prefix.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.demonst.csv
./../../libexec/iconv_euc_to_utf8.sh ./Adverb.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.name.csv
./../../libexec/iconv_euc_to_utf8.sh ./Postp-col.csv
./../../libexec/iconv_euc_to_utf8.sh ./Conjunction.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.nai.csv
./../../libexec/iconv_euc_to_utf8.sh ./Noun.adverbal.csv
rm ./Noun.place.csv
rm ./Auxil.csv
rm ./Noun.verbal.csv
rm ./Symbol.csv
rm ./Noun.org.csv
rm ./Noun.csv
rm ./Postp.csv
rm ./Adj.csv
rm ./Filler.csv
rm ./Noun.proper.csv
rm ./Noun.number.csv
rm ./Suffix.csv
rm ./Noun.others.csv
rm ./Interjection.csv
rm ./Noun.adjv.csv
rm ./Verb.csv
rm ./Others.csv
rm ./Adnominal.csv
rm ./Prefix.csv
rm ./Noun.demonst.csv
rm ./Adverb.csv
rm ./Noun.name.csv
rm ./Postp-col.csv
rm ./Conjunction.csv
rm ./Noun.nai.csv
rm ./Noun.adverbal.csv
./../../libexec/iconv_euc_to_utf8.sh ./rewrite.def
./../../libexec/iconv_euc_to_utf8.sh ./matrix.def
./../../libexec/iconv_euc_to_utf8.sh ./left-id.def
./../../libexec/iconv_euc_to_utf8.sh ./pos-id.def
./../../libexec/iconv_euc_to_utf8.sh ./unk.def
./../../libexec/iconv_euc_to_utf8.sh ./feature.def
./../../libexec/iconv_euc_to_utf8.sh ./right-id.def
./../../libexec/iconv_euc_to_utf8.sh ./char.def
rm ./rewrite.def
rm ./matrix.def
rm ./left-id.def
rm ./pos-id.def
rm ./unk.def
rm ./feature.def
rm ./right-id.def
rm ./char.def
mv ./Noun.others.csv.utf8 ./Noun.others.csv
mv ./Noun.number.csv.utf8 ./Noun.number.csv
mv ./Filler.csv.utf8 ./Filler.csv
mv ./Others.csv.utf8 ./Others.csv
mv ./unk.def.utf8 ./unk.def
mv ./Postp-col.csv.utf8 ./Postp-col.csv
mv ./Adnominal.csv.utf8 ./Adnominal.csv
mv ./Noun.verbal.csv.utf8 ./Noun.verbal.csv
mv ./matrix.def.utf8 ./matrix.def
mv ./Noun.csv.utf8 ./Noun.csv
mv ./Noun.demonst.csv.utf8 ./Noun.demonst.csv
mv ./char.def.utf8 ./char.def
mv ./Symbol.csv.utf8 ./Symbol.csv
mv ./Auxil.csv.utf8 ./Auxil.csv
mv ./Noun.name.csv.utf8 ./Noun.name.csv
mv ./feature.def.utf8 ./feature.def
mv ./Suffix.csv.utf8 ./Suffix.csv
mv ./Adverb.csv.utf8 ./Adverb.csv
mv ./Conjunction.csv.utf8 ./Conjunction.csv
mv ./pos-id.def.utf8 ./pos-id.def
mv ./Postp.csv.utf8 ./Postp.csv
mv ./right-id.def.utf8 ./right-id.def
mv ./Noun.nai.csv.utf8 ./Noun.nai.csv
mv ./Interjection.csv.utf8 ./Interjection.csv
mv ./Prefix.csv.utf8 ./Prefix.csv
mv ./Noun.place.csv.utf8 ./Noun.place.csv
mv ./Noun.adjv.csv.utf8 ./Noun.adjv.csv
mv ./rewrite.def.utf8 ./rewrite.def
mv ./Verb.csv.utf8 ./Verb.csv
mv ./left-id.def.utf8 ./left-id.def
mv ./Noun.proper.csv.utf8 ./Noun.proper.csv
mv ./Adj.csv.utf8 ./Adj.csv
mv ./Noun.adverbal.csv.utf8 ./Noun.adverbal.csv
mv ./Noun.org.csv.utf8 ./Noun.org.csv
[make-mecab-ipadic-NEologd] : Fix yomigana field of IPA dictionary
patching file Noun.csv
patching file Noun.place.csv
patching file Verb.csv
patching file Noun.verbal.csv
patching file Noun.name.csv
patching file Noun.adverbal.csv
patching file Noun.csv
patching file Noun.name.csv
patching file Noun.org.csv
patching file Noun.others.csv
patching file Noun.place.csv
patching file Noun.proper.csv
patching file Noun.verbal.csv
patching file Prefix.csv
patching file Suffix.csv
patching file Noun.proper.csv
patching file Noun.csv
patching file Noun.name.csv
patching file Noun.org.csv
patching file Noun.place.csv
patching file Noun.proper.csv
patching file Noun.verbal.csv
patching file Noun.name.csv
patching file Noun.org.csv
patching file Noun.place.csv
patching file Noun.proper.csv
patching file Suffix.csv
patching file Noun.demonst.csv
patching file Noun.csv
patching file Noun.name.csv
[make-mecab-ipadic-NEologd] : Copy user dictionary resource
[make-mecab-ipadic-NEologd] : Install adverb entries using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-adverb-dict-seed.20150623.csv.xz
[make-mecab-ipadic-NEologd] : Install interjection entries using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-interjection-dict-seed.20170216.csv.xz
[make-mecab-ipadic-NEologd] : Install noun orthographic variant entries using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-common-noun-ortho-variant-dict-seed.20170228.csv.xz
[make-mecab-ipadic-NEologd] : Install noun orthographic variant entries using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-proper-noun-ortho-variant-dict-seed.20161110.csv.xz
[make-mecab-ipadic-NEologd] : Install entries of orthographic variant of a noun used as verb form using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-noun-sahen-conn-ortho-variant-dict-seed.20160323.csv.xz
[make-mecab-ipadic-NEologd] : Install frequent adjective orthographic variant entries using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-adjective-std-dict-seed.20151126.csv.xz
[make-mecab-ipadic-NEologd] : Install infrequent adjective orthographic variant entries using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-adjective-exp-dict-seed.20151126.csv.xz
[make-mecab-ipadic-NEologd] : Install adjective verb orthographic variant entries using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-adjective-verb-dict-seed.20160324.csv.xz
[make-mecab-ipadic-NEologd] : Install infrequent datetime representation entries using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-date-time-infreq-dict-seed.20190415.csv.xz
[make-mecab-ipadic-NEologd] : Install infrequent quantity representation entries using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-quantity-infreq-dict-seed.20190415.csv.xz
[make-mecab-ipadic-NEologd] : Install entries of ill formed words using /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../seed/neologd-ill-formed-words-dict-seed.20170127.csv.xz
[make-mecab-ipadic-NEologd] : Re-Index system dictionary
reading ./unk.def ... 40
emitting double-array: 100% |###########################################| 
./model.def is not found. skipped.
reading ./neologd-adjective-verb-dict-seed.20160324.csv ... 20268
reading ./Noun.place.csv ... 73194
reading ./Auxil.csv ... 199
reading ./Noun.verbal.csv ... 12150
reading ./Symbol.csv ... 208
reading ./Noun.org.csv ... 17149
reading ./Noun.csv ... 60734
reading ./Postp.csv ... 146
reading ./neologd-ill-formed-words-dict-seed.20170127.csv ... 60616
reading ./Adj.csv ... 27210
reading ./Filler.csv ... 19
reading ./Noun.proper.csv ... 27493
reading ./Noun.number.csv ... 42
reading ./Suffix.csv ... 1448
reading ./mecab-user-dict-seed.20200910.csv ... 3224584
reading ./Noun.others.csv ... 153
reading ./Interjection.csv ... 252
reading ./Noun.adjv.csv ... 3328
reading ./Verb.csv ... 130750
reading ./neologd-date-time-infreq-dict-seed.20190415.csv ... 16866
reading ./neologd-proper-noun-ortho-variant-dict-seed.20161110.csv ... 138379
reading ./neologd-adjective-exp-dict-seed.20151126.csv ... 1051146
reading ./Others.csv ... 2
reading ./Adnominal.csv ... 135
reading ./neologd-common-noun-ortho-variant-dict-seed.20170228.csv ... 152869
reading ./neologd-quantity-infreq-dict-seed.20190415.csv ... 229216
reading ./neologd-noun-sahen-conn-ortho-variant-dict-seed.20160323.csv ... 26058
reading ./neologd-adjective-std-dict-seed.20151126.csv ... 507812
reading ./Prefix.csv ... 224
reading ./Noun.demonst.csv ... 120
reading ./Adverb.csv ... 3032
reading ./neologd-adverb-dict-seed.20150623.csv ... 139792
reading ./neologd-interjection-dict-seed.20170216.csv ... 4701
reading ./Noun.name.csv ... 34215
reading ./Postp-col.csv ... 91
reading ./Conjunction.csv ... 171
reading ./Noun.nai.csv ... 42
reading ./Noun.adverbal.csv ... 808
emitting double-array: 100% |###########################################| 
reading ./matrix.def ... 1316x1316
emitting matrix      : 100% |###########################################| 

done!
[make-mecab-ipadic-NEologd] : Make custom system dictionary on /usr/local/lib/mecab/dic/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0-20070801-neologd-20200910
make: Nothing to be done for `all'.
[make-mecab-ipadic-NEologd] : Finish..
[install-mecab-ipadic-NEologd] : Get results of tokenize test
[test-mecab-ipadic-NEologd] : Start..
[test-mecab-ipadic-NEologd] : Replace timestamp from 'git clone' date to 'git commit' date
[test-mecab-ipadic-NEologd] : Get buzz phrases
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 31978    0 31978    0     0   127k      0 --:--:-- --:--:-- --:--:--  127k
[test-mecab-ipadic-NEologd] : Get difference between default system dictionary and mecab-ipadic-NEologd
[test-mecab-ipadic-NEologd] : Something wrong. You shouldn't install mecab-ipadic-NEologd yet.
[test-mecab-ipadic-NEologd] : Finish..

[install-mecab-ipadic-NEologd] : Please check the list of differences in the upper part.

[install-mecab-ipadic-NEologd] : Do you want to install mecab-ipadic-NEologd? Type yes or no.
yes
[install-mecab-ipadic-NEologd] : OK. Let's install mecab-ipadic-NEologd.
[install-mecab-ipadic-NEologd] : Start..
[install-mecab-ipadic-NEologd] : /usr/local/lib/mecab/dic is current user's directory
[install-mecab-ipadic-NEologd] : Make install to /usr/local/lib/mecab/dic/mecab-ipadic-neologd
make[1]: Nothing to be done for `install-exec-am'.
/bin/sh ./mkinstalldirs /usr/local/lib/mecab/dic/mecab-ipadic-neologd
 /usr/bin/install -c -m 644 ./matrix.bin /usr/local/lib/mecab/dic/mecab-ipadic-neologd/matrix.bin
 /usr/bin/install -c -m 644 ./char.bin /usr/local/lib/mecab/dic/mecab-ipadic-neologd/char.bin
 /usr/bin/install -c -m 644 ./sys.dic /usr/local/lib/mecab/dic/mecab-ipadic-neologd/sys.dic
 /usr/bin/install -c -m 644 ./unk.dic /usr/local/lib/mecab/dic/mecab-ipadic-neologd/unk.dic
 /usr/bin/install -c -m 644 ./left-id.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/left-id.def
 /usr/bin/install -c -m 644 ./right-id.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/right-id.def
 /usr/bin/install -c -m 644 ./rewrite.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/rewrite.def
 /usr/bin/install -c -m 644 ./pos-id.def /usr/local/lib/mecab/dic/mecab-ipadic-neologd/pos-id.def
 /usr/bin/install -c -m 644 ./dicrc /usr/local/lib/mecab/dic/mecab-ipadic-neologd/dicrc

[install-mecab-ipadic-NEologd] : Install completed.
[install-mecab-ipadic-NEologd] : When you use MeCab, you can set '/usr/local/lib/mecab/dic/mecab-ipadic-neologd' as a value of '-d' option of MeCab.
[install-mecab-ipadic-NEologd] : Usage of mecab-ipadic-NEologd is here.
Usage:
    $ mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd ...

[install-mecab-ipadic-NEologd] : Finish..
[install-mecab-ipadic-NEologd] : Finish..
% 

【Mac】Python の CaboCha をインストールして係り受け解析を行う

Mac 環境で Python の CaboCha を使って係り受け解析を行う方法を紹介します。

  1. MeCab、CRF++、CaboCha のインストール
    • MeCab のインストール
    • CRF++ と CaboCha のインストール
  2. CaboCha を使ってみる(Python 経由ではない)
  3. CaboCha の Python バインディング
    • 新たに仮想環境を作った場合
  4. Python で CaboCha を使う
    • 係り受け関係の出力
    • 形態素の出力
  5. NEologd 辞書で新語対応
    • NEologd 辞書のインストール
    • NEologd 辞書を使う

1. MeCab、CRF++、CaboCha のインストール

まず MeCab、CRF++ そして CaboCha をインストールするので、Python 仮想環境を起動した状態で下記を実行します。

MeCab のインストール

% brew install mecab
% brew install mecab-ipadic
% pip install mecab-python3

CRF++ と CaboCha のインストール

% brew install crf++
% brew install cabocha

2. CaboCha を使ってみる(Python 経由ではない)

上記をインストールするとターミナルで直接であれば CaboCha が使える様になります。

コマンド「cabocha」を実行してそのまま「今日は良い天気ですね。」と入力すると下記の様に出力されます。

% cabocha
今日は良い天気ですね。
      今日は---D
          良い-D
    天気ですね。
EOS

ただ、ここまでは Python を立ち上げずに直接 Shellscript で CaboCha を使っただけです。

3. CaboCha の Python バインディング

cabocha-0.69.tar.bz2 のリンクがあるのでここからダウンロードします。

Downloads フォルダに圧縮ファイルがありますね。

% cd Users/ユーザー名/Downloads
% ls
cabocha-0.69.tar.bz2

ファイルを解凍して、configure、make、make install を行います。

% tar xfv cabocha-0.69.tar.bz2
% cd cabocha-0.69
% ./configure --prefix=/usr/local/cabocha/0_69 --with-charset=UTF8 --with-posset=IPA
% make
% make install

Python の仮想環境を立ち上げた状態で「cabocha-0.69」直下の「python」フォルダに移動して「sudo python setup.py install」を実行します。

% cd python
% sudo python setup.py install

こうすると import CaboCha できる様になります。

ただ、Downloads フォルダから cabocha-0.69 ファイルを削除してもできる意味をまだいまいち理解できていません。仮想環境の site-packages に CaboCha.py は作られたんですけどそれで間に合ってるんですかね。後でログをよくみてみます。。。

追加の仮想環境を作った場合

すでに一度上記の行程を経て CaboCha を使っている場合、新たに追加の仮想環境を作る際にはいくつか行程を飛ばすことができます。

解凍した「cabocha-0.69」がある状態で「cabocha-0.69/python」ディレクトリに入り、新たに作った仮想環境を起動し、下記を実行すれば OK です。

% pip install mecab-python3
% cd cabocha-0.69/python
% sudo python setup.py install

4. Python で CaboCha を使う

とりあえず Python を立ち上げて「import CaboCha」もできますし下記の処理も実行できました。

>>> import CaboCha
>>> c = CaboCha.Parser()
>>> sentence = '今日は良い天気ですね。'
>>> print(c.parseToString(sentence))
      今日は---D
          良い-D
    天気ですね。
EOS

係り受け関係の出力

>>> tree =  c.parse(sentence)
>>> print(tree.toString(CaboCha.FORMAT_TREE))
      今日は---D
          良い-D
    天気ですね。
EOS

>>> print(tree.toString(CaboCha.FORMAT_LATTICE))
* 0 2D 0/1 -1.140323
今日	名詞,副詞可能,*,*,*,*,今日,キョウ,キョー
は	助詞,係助詞,*,*,*,*,は,ハ,ワ
* 1 2D 0/0 -1.140323
良い	形容詞,自立,*,*,形容詞・アウオ段,基本形,良い,ヨイ,ヨイ
* 2 -1D 0/2 0.000000
天気	名詞,一般,*,*,*,*,天気,テンキ,テンキ
です	助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
ね	助詞,終助詞,*,*,*,*,ね,ネ,ネ
。	記号,句点,*,*,*,*,。,。,。
EOS

形態素の出力

形態素の文字列

>>> for i in range(tree.size()):
...     print(tree.token(i).surface)
... 
今日
は
良い
天気
です
ね
。
>>> 

形態素の情報

>>> for i in range(tree.size()):
...     print(tree.token(i).feature)
... 
名詞,副詞可能,*,*,*,*,今日,キョウ,キョー
助詞,係助詞,*,*,*,*,は,ハ,ワ
形容詞,自立,*,*,形容詞・アウオ段,基本形,良い,ヨイ,ヨイ
名詞,一般,*,*,*,*,天気,テンキ,テンキ
助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
助詞,終助詞,*,*,*,*,ね,ネ,ネ
記号,句点,*,*,*,*,。,。,。
>>> 

5. NEologd 辞書で新語対応

デフォルトでは「IPA 辞書」という辞書が使用されますが、新語に対応するには「NEologd 辞書」が多く使用されている様です。

NEologd 辞書のインストール

通常の辞書「ipadic」が格納されているディレクトリに移動します。おそらく「/usr/local/lib/mecab/dic」もしくはそれに似た場所にあると思います。

% /usr/local/lib/mecab/dic
% ls
ipadic

git clone で「mecab-ipadic-neologd」を作成します。

% git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
% ls
ipadic			mecab-ipadic-neologd

「mecab-ipadic-neologd」フォルダに移動し、コマンド「./bin/install-mecab-ipadic-neologd -n -a」を実行します。

% cd mecab-ipadic-neologd
% ./bin/install-mecab-ipadic-neologd -n -a

途中「Do you want to install mecab-ipadic-NEologd? Type yes or no.」と聞かれるので「yes」と入力します。

これでインストール完了です。

NEologd 辞書を使う

CaboCha、MeCab を使用する際、デフォルトでは IPA 辞書が使用されるので、明示的に NEologd 辞書を指定する必要があります。

実行時に「-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd」を渡すのですが、下記コードの様に「CaboCha.Parser('-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd')」としてあげれば OK です。

import CaboCha

sentence = '霜降り明星(しもふりみょうじょう)は、2018年『M-1グランプリ』14代目王者。'

# IPA 辞書
c = CaboCha.Parser()
print('IPA 辞書:')
print(c.parseToString(sentence))

# NEologd 辞書
c = CaboCha.Parser('-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd')
print('NEologd 辞書:')
print(c.parseToString(sentence))

上記を実行すると下記のアウトプットが返ってきます。

IPA 辞書:
      霜降り明星---D            
            (しも-D            
              ふりみ-D          
      ょうじょう)は、---------D
                  2018年---D   |
                      『M--D   |
               1グランプリ』-D |
                        14代目-D
                          王者。
EOS

NEologd 辞書:
        霜降り明星-----D      
              (しも-D |      
                  ふり-D      
      みょうじょう)は、-----D
                    2018年-D |
           『M-1グランプリ』-D
                  14代目王者。
EOS

「しもふりみょうじょう」や「M-1グランプリ」の部分が若干違いますね。

若干応用編として、YouTube のコメント欄を取得して形態素解析をしてみましたので下記に貼っておきます。

▶︎【Mac】Python の MeCab で YouTube コメントを 形態素解析にかける

【ログ】macOS Big Sur 11.2.2: sudo python setup.py install(cabocha-0.69 の configure, make, make install 実行後)

  • 実行コマンド:sudo python setup.py install
    • cabocha-0.69 の configure, make, make install 実行後
  • 実行日:2021/05/29
  • 実行環境:macOS Big Sur 11.2.2
% sudo python3 setup.py install
running install
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.8
copying CaboCha.py -> build/lib.macosx-10.9-x86_64-3.8
running build_ext
building '_CaboCha' extension
creating build/temp.macosx-10.9-x86_64-3.8
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I/usr/local/Cellar/cabocha/0.69/include -I/Users/ユーザー名/仮想環境ディレクトリ/include -I/Library/Frameworks/Python.framework/Versions/3.8/include/python3.8 -c CaboCha_wrap.cxx -o build/temp.macosx-10.9-x86_64-3.8/CaboCha_wrap.o
In file included from CaboCha_wrap.cxx:154:
In file included from /Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/Python.h:85:
In file included from /Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/pytime.h:6:
In file included from /Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/object.h:746:
/Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:177:16: warning: 'tp_print' is deprecated [-Wdeprecated-declarations]
typedef struct _typeobject {
               ^
CaboCha_wrap.cxx:1947:23: note: in implicit copy assignment operator for '_typeobject' first required here
    swigpyobject_type = tmp;
                      ^
/Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:260:5: note: 'tp_print' has been explicitly marked deprecated here
    Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
    ^
/Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/pyport.h:515:54: note: expanded from macro 'Py_DEPRECATED'
#define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                     ^
In file included from CaboCha_wrap.cxx:154:
In file included from /Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/Python.h:85:
In file included from /Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/pytime.h:6:
In file included from /Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/object.h:746:
/Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:177:16: warning: 'tp_print' is deprecated [-Wdeprecated-declarations]
typedef struct _typeobject {
               ^
/Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/cpython/object.h:260:5: note: 'tp_print' has been explicitly marked deprecated here
    Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);
    ^
/Library/Frameworks/Python.framework/Versions/3.8/include/python3.8/pyport.h:515:54: note: expanded from macro 'Py_DEPRECATED'
#define Py_DEPRECATED(VERSION_UNUSED) __attribute__((__deprecated__))
                                                     ^
CaboCha_wrap.cxx:3669:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:3666:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:3871:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:3868:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:3919:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:3916:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:3953:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:3950:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:3985:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:3982:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4026:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4023:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4067:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4064:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4118:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4115:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4152:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4149:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4183:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4180:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4214:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4211:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4246:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4243:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4278:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4275:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4310:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4307:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4351:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4348:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4383:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4380:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4423:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4420:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4455:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4452:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4495:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4492:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4527:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4524:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4567:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4564:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4599:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4596:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4622:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4619:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4653:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4650:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4702:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4699:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4746:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4743:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4789:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4786:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4868:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4865:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4891:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4888:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4922:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4919:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4955:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4952:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:4980:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:4977:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:5041:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:5038:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:5130:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:5127:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:5225:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:5222:12: note: for type 'char *'
    catch (char *e) {
           ^
CaboCha_wrap.cxx:5320:18: warning: exception of type 'const char *' will be caught by earlier handler [-Wexceptions]
    catch (const char *e) {
                 ^
CaboCha_wrap.cxx:5317:12: note: for type 'char *'
    catch (char *e) {
           ^
38 warnings generated.
warning: no library file corresponding to '-L/usr/local/Cellar/mecab/0.996/lib' found (skipping)
g++ -bundle -undefined dynamic_lookup -arch x86_64 -g build/temp.macosx-10.9-x86_64-3.8/CaboCha_wrap.o -L/usr/local/Cellar/cabocha/0.69/lib -lcabocha -lcrfpp -lmecab -liconv -lmecab -lstdc++ -o build/lib.macosx-10.9-x86_64-3.8/_CaboCha.cpython-38-darwin.so
ld: warning: dylib (/usr/local/Cellar/cabocha/0.69/lib/libcabocha.dylib) was built for newer macOS version (11.0) than being linked (10.9)
ld: warning: dylib (/usr/local/lib/libcrfpp.dylib) was built for newer macOS version (11.0) than being linked (10.9)
ld: warning: dylib (/usr/local/lib/libmecab.dylib) was built for newer macOS version (11.0) than being linked (10.9)
running install_lib
copying build/lib.macosx-10.9-x86_64-3.8/_CaboCha.cpython-38-darwin.so -> /Users/ユーザー名/仮想環境ディレクトリ/lib/python3.8/site-packages
copying build/lib.macosx-10.9-x86_64-3.8/CaboCha.py -> /Users/ユーザー名/仮想環境ディレクトリ/lib/python3.8/site-packages
byte-compiling /Users/ユーザー名/仮想環境ディレクトリ/lib/python3.8/site-packages/CaboCha.py to CaboCha.cpython-38.pyc
running install_egg_info
Writing /Users/ユーザー名/仮想環境ディレクトリ/lib/python3.8/site-packages/cabocha_python-0.69-py3.8.egg-info
%

【ログ】macOS Big Sur 11.2.2: tar xfv cabocha-0.69.tar.bz2

  • 実行コマンド:tar xfv cabocha-0.69.tar.bz2
  • 実行日:2021/05/29
  • 実行環境:macOS Big Sur 11.2.2
% tar xfv cabocha-0.69.tar.bz2
x cabocha-0.69/
x cabocha-0.69/cabocha-config.in
x cabocha-0.69/compile
x cabocha-0.69/swig/
x cabocha-0.69/swig/version.h.in
x cabocha-0.69/swig/Makefile
x cabocha-0.69/swig/version.h
x cabocha-0.69/swig/CaboCha.i
x cabocha-0.69/missing
x cabocha-0.69/java/
x cabocha-0.69/java/test.java
x cabocha-0.69/java/Makefile
x cabocha-0.69/java/org/
x cabocha-0.69/java/org/chasen/
x cabocha-0.69/java/org/chasen/cabocha/
x cabocha-0.69/java/org/chasen/cabocha/FormatType.java
x cabocha-0.69/java/org/chasen/cabocha/OutputLayerType.java
x cabocha-0.69/java/org/chasen/cabocha/Token.java
x cabocha-0.69/java/org/chasen/cabocha/CaboChaConstants.java
x cabocha-0.69/java/org/chasen/cabocha/ParserType.java
x cabocha-0.69/java/org/chasen/cabocha/ParsingAlgorithm.java
x cabocha-0.69/java/org/chasen/cabocha/Chunk.java
x cabocha-0.69/java/org/chasen/cabocha/InputLayerType.java
x cabocha-0.69/java/org/chasen/cabocha/CaboCha.java
x cabocha-0.69/java/org/chasen/cabocha/CaboChaJNI.java
x cabocha-0.69/java/org/chasen/cabocha/PossetType.java
x cabocha-0.69/java/org/chasen/cabocha/Tree.java
x cabocha-0.69/java/org/chasen/cabocha/CharsetType.java
x cabocha-0.69/java/org/chasen/cabocha/Parser.java
x cabocha-0.69/java/CaboCha_wrap.cxx
x cabocha-0.69/ltmain.sh
x cabocha-0.69/config.guess
x cabocha-0.69/man/
x cabocha-0.69/man/Makefile.in
x cabocha-0.69/man/cabocha.1
x cabocha-0.69/man/Makefile.am
x cabocha-0.69/BSD
x cabocha-0.69/python/
x cabocha-0.69/python/test.py
x cabocha-0.69/python/CaboCha.py
x cabocha-0.69/python/CaboCha_wrap.cxx
x cabocha-0.69/python/setup.py
x cabocha-0.69/AUTHORS
x cabocha-0.69/ruby/
x cabocha-0.69/ruby/CaboCha_wrap.cpp
x cabocha-0.69/ruby/extconf.rb
x cabocha-0.69/ruby/test.rb
x cabocha-0.69/Makefile.in
x cabocha-0.69/NEWS
x cabocha-0.69/install-sh
x cabocha-0.69/cabocha.iss.in
x cabocha-0.69/ChangeLog
x cabocha-0.69/configure
x cabocha-0.69/src/
x cabocha-0.69/src/string_buffer.cpp
x cabocha-0.69/src/tree_allocator.cpp
x cabocha-0.69/src/dep.h
x cabocha-0.69/src/dep_learner.cpp
x cabocha-0.69/src/tree_allocator.h
x cabocha-0.69/src/svm.h
x cabocha-0.69/src/svm.cpp
x cabocha-0.69/src/ucstable.h
x cabocha-0.69/src/utils.h
x cabocha-0.69/src/selector.cpp
x cabocha-0.69/src/chunk_learner.cpp
x cabocha-0.69/src/string_buffer.h
x cabocha-0.69/src/ucs.cpp
x cabocha-0.69/src/ne.cpp
x cabocha-0.69/src/eval.cpp
x cabocha-0.69/src/cabocha.cpp
x cabocha-0.69/src/Makefile.in
x cabocha-0.69/src/scoped_ptr.h
x cabocha-0.69/src/chunker.h
x cabocha-0.69/src/normalizer.rule
x cabocha-0.69/src/common.h
x cabocha-0.69/src/normalizer_rule.sh
x cabocha-0.69/src/darts.h
x cabocha-0.69/src/learner.cpp
x cabocha-0.69/src/cabocha.h
x cabocha-0.69/src/morph.h
x cabocha-0.69/src/svm_learn.cpp
x cabocha-0.69/src/Makefile.msvc.in
x cabocha-0.69/src/timer.h
x cabocha-0.69/src/chunker.cpp
x cabocha-0.69/src/utils.cpp
x cabocha-0.69/src/param.h
x cabocha-0.69/src/winmain.h
x cabocha-0.69/src/normalizer.h
x cabocha-0.69/src/param.cpp
x cabocha-0.69/src/parser.cpp
x cabocha-0.69/src/ne.h
x cabocha-0.69/src/normalizer_rule.h
x cabocha-0.69/src/svm_learn.h
x cabocha-0.69/src/ucs.h
x cabocha-0.69/src/cabocha-model-index.cpp
x cabocha-0.69/src/mmap.h
x cabocha-0.69/src/analyzer.h
x cabocha-0.69/src/make.bat
x cabocha-0.69/src/tree.cpp
x cabocha-0.69/src/char_category.h
x cabocha-0.69/src/Makefile.am
x cabocha-0.69/src/dep.cpp
x cabocha-0.69/src/morph.cpp
x cabocha-0.69/src/selector_pat.h
x cabocha-0.69/src/cabocha-system-eval.cpp
x cabocha-0.69/src/cabocha-learn.cpp
x cabocha-0.69/src/stream_wrapper.h
x cabocha-0.69/src/selector.h
x cabocha-0.69/src/libcabocha.cpp
x cabocha-0.69/src/normalizer.cpp
x cabocha-0.69/src/freelist.h
x cabocha-0.69/perl/
x cabocha-0.69/perl/test.pl
x cabocha-0.69/perl/Makefile.PL
x cabocha-0.69/perl/CaboCha_wrap.o
x cabocha-0.69/perl/CaboCha.bs
x cabocha-0.69/perl/blib/
x cabocha-0.69/perl/blib/bin/
x cabocha-0.69/perl/blib/bin/.exists
x cabocha-0.69/perl/blib/arch/
x cabocha-0.69/perl/blib/arch/.exists
x cabocha-0.69/perl/blib/arch/auto/
x cabocha-0.69/perl/blib/arch/auto/CaboCha/
x cabocha-0.69/perl/blib/arch/auto/CaboCha/.exists
x cabocha-0.69/perl/blib/arch/auto/CaboCha/CaboCha.so
x cabocha-0.69/perl/blib/arch/auto/CaboCha/CaboCha.bs
x cabocha-0.69/perl/blib/lib/
x cabocha-0.69/perl/blib/lib/.exists
x cabocha-0.69/perl/blib/lib/auto/
x cabocha-0.69/perl/blib/lib/auto/CaboCha/
x cabocha-0.69/perl/blib/lib/auto/CaboCha/.exists
x cabocha-0.69/perl/blib/lib/CaboCha.pm
x cabocha-0.69/perl/blib/man1/
x cabocha-0.69/perl/blib/man1/.exists
x cabocha-0.69/perl/blib/script/
x cabocha-0.69/perl/blib/script/.exists
x cabocha-0.69/perl/blib/man3/
x cabocha-0.69/perl/blib/man3/.exists
x cabocha-0.69/perl/CaboCha_wrap.cxx
x cabocha-0.69/perl/pm_to_blib
x cabocha-0.69/perl/CaboCha.pm
x cabocha-0.69/perl/MYMETA.yml
x cabocha-0.69/config.rpath
x cabocha-0.69/TODO
x cabocha-0.69/configure.in
x cabocha-0.69/config.sub
x cabocha-0.69/LGPL
x cabocha-0.69/tools/
x cabocha-0.69/tools/kc2cabocha.pl
x cabocha-0.69/tools/irex2cabocha.pl
x cabocha-0.69/tools/chasen2mecab.pl
x cabocha-0.69/tools/kc2juman.pl
x cabocha-0.69/tools/KyotoCorpus.pm
x cabocha-0.69/tools/KNBC2KC.pl
x cabocha-0.69/cabocharc.in
x cabocha-0.69/INSTALL
x cabocha-0.69/aclocal.m4
x cabocha-0.69/README
x cabocha-0.69/config.h.in
x cabocha-0.69/COPYING
x cabocha-0.69/example/
x cabocha-0.69/example/example2.cpp
x cabocha-0.69/example/example.c
x cabocha-0.69/Makefile.am
x cabocha-0.69/model/
x cabocha-0.69/model/dep.ipa.txt
x cabocha-0.69/model/ne.juman.txt
x cabocha-0.69/model/dep.juman.txt
x cabocha-0.69/model/Makefile.in
x cabocha-0.69/model/dep.unidic.txt
x cabocha-0.69/model/chunk.ipa.txt
x cabocha-0.69/model/chunk.unidic.txt
x cabocha-0.69/model/ne.ipa.txt
x cabocha-0.69/model/ne.unidic.txt
x cabocha-0.69/model/chunk.juman.txt
x cabocha-0.69/model/Makefile.am
x cabocha-0.69/doc/
x cabocha-0.69/doc/README.txt
x cabocha-0.69/doc/doxygen/
x cabocha-0.69/doc/doxygen/classes.html
x cabocha-0.69/doc/doxygen/ftv2plastnode.png
x cabocha-0.69/doc/doxygen/nav_g.png
x cabocha-0.69/doc/doxygen/files.html
x cabocha-0.69/doc/doxygen/tab_b.gif
x cabocha-0.69/doc/doxygen/nav_h.png
x cabocha-0.69/doc/doxygen/namespaceCaboCha.html
x cabocha-0.69/doc/doxygen/functions_vars.html
x cabocha-0.69/doc/doxygen/tab_s.png
x cabocha-0.69/doc/doxygen/namespacemembers_eval.html
x cabocha-0.69/doc/doxygen/ftv2pnode.png
x cabocha-0.69/doc/doxygen/cabocha_8h.html
x cabocha-0.69/doc/doxygen/open.png
x cabocha-0.69/doc/doxygen/globals_func.html
x cabocha-0.69/doc/doxygen/structcabocha__token__t.html
x cabocha-0.69/doc/doxygen/doxygen.css
x cabocha-0.69/doc/doxygen/ftv2node.png
x cabocha-0.69/doc/doxygen/functions_func.html
x cabocha-0.69/doc/doxygen/ftv2mnode.png
x cabocha-0.69/doc/doxygen/ftv2doc.png
x cabocha-0.69/doc/doxygen/globals_enum.html
x cabocha-0.69/doc/doxygen/classCaboCha_1_1Tree.html
x cabocha-0.69/doc/doxygen/functions.html
x cabocha-0.69/doc/doxygen/ftv2folderopen.png
x cabocha-0.69/doc/doxygen/namespacemembers.html
x cabocha-0.69/doc/doxygen/globals.html
x cabocha-0.69/doc/doxygen/ftv2link.png
x cabocha-0.69/doc/doxygen/ftv2folderclosed.png
x cabocha-0.69/doc/doxygen/structcabocha__token__t-members.html
x cabocha-0.69/doc/doxygen/bdwn.png
x cabocha-0.69/doc/doxygen/namespacemembers_func.html
x cabocha-0.69/doc/doxygen/structcabocha__chunk__t.html
x cabocha-0.69/doc/doxygen/bc_s.png
x cabocha-0.69/doc/doxygen/cabocha_8h_source.html
x cabocha-0.69/doc/doxygen/globals_eval.html
x cabocha-0.69/doc/doxygen/ftv2mo.png
x cabocha-0.69/doc/doxygen/doxygen.png
x cabocha-0.69/doc/doxygen/index.html
x cabocha-0.69/doc/doxygen/tab_b.png
x cabocha-0.69/doc/doxygen/closed.png
x cabocha-0.69/doc/doxygen/nav_f.png
x cabocha-0.69/doc/doxygen/ftv2lastnode.png
x cabocha-0.69/doc/doxygen/classCaboCha_1_1Tree-members.html
x cabocha-0.69/doc/doxygen/tabs.css
x cabocha-0.69/doc/doxygen/ftv2vertline.png
x cabocha-0.69/doc/doxygen/ftv2cl.png
x cabocha-0.69/doc/doxygen/tab_h.png
x cabocha-0.69/doc/doxygen/globals_type.html
x cabocha-0.69/doc/doxygen/structcabocha__chunk__t-members.html
x cabocha-0.69/doc/doxygen/globals_defs.html
x cabocha-0.69/doc/doxygen/annotated.html
x cabocha-0.69/doc/doxygen/namespacemembers_type.html
x cabocha-0.69/doc/doxygen/tab_l.gif
x cabocha-0.69/doc/doxygen/tab_a.png
x cabocha-0.69/doc/doxygen/sync_off.png
x cabocha-0.69/doc/doxygen/ftv2ns.png
x cabocha-0.69/doc/doxygen/tab_r.gif
x cabocha-0.69/doc/doxygen/classCaboCha_1_1Parser-members.html
x cabocha-0.69/doc/doxygen/ftv2splitbar.png
x cabocha-0.69/doc/doxygen/ftv2mlastnode.png
x cabocha-0.69/doc/doxygen/classCaboCha_1_1Parser.html
x cabocha-0.69/doc/doxygen/namespaces.html
x cabocha-0.69/doc/doxygen/sync_on.png
x cabocha-0.69/doc/doxygen/namespacemembers_enum.html
x cabocha-0.69/doc/doxygen/dir_68267d1309a1af8e8297ef4c3efbcdba.html
x cabocha-0.69/doc/doxygen/dynsections.js
x cabocha-0.69/doc/doxygen/ftv2blank.png
x cabocha-0.69/doc/cabocha.cfg
% 

ちなみにファイルの中身

% cd cabocha-0.69
cabocha-0.69 % tree
.
├── AUTHORS
├── BSD
├── COPYING
├── ChangeLog
├── INSTALL
├── LGPL
├── Makefile.am
├── Makefile.in
├── NEWS
├── README
├── TODO
├── aclocal.m4
├── cabocha-config.in
├── cabocha.iss.in
├── cabocharc.in
├── compile
├── config.guess
├── config.h.in
├── config.rpath
├── config.sub
├── configure
├── configure.in
├── doc
│   ├── README.txt
│   ├── cabocha.cfg
│   └── doxygen
│       ├── annotated.html
│       ├── bc_s.png
│       ├── bdwn.png
│       ├── cabocha_8h.html
│       ├── cabocha_8h_source.html
│       ├── classCaboCha_1_1Parser-members.html
│       ├── classCaboCha_1_1Parser.html
│       ├── classCaboCha_1_1Tree-members.html
│       ├── classCaboCha_1_1Tree.html
│       ├── classes.html
│       ├── closed.png
│       ├── dir_68267d1309a1af8e8297ef4c3efbcdba.html
│       ├── doxygen.css
│       ├── doxygen.png
│       ├── dynsections.js
│       ├── files.html
│       ├── ftv2blank.png
│       ├── ftv2cl.png
│       ├── ftv2doc.png
│       ├── ftv2folderclosed.png
│       ├── ftv2folderopen.png
│       ├── ftv2lastnode.png
│       ├── ftv2link.png
│       ├── ftv2mlastnode.png
│       ├── ftv2mnode.png
│       ├── ftv2mo.png
│       ├── ftv2node.png
│       ├── ftv2ns.png
│       ├── ftv2plastnode.png
│       ├── ftv2pnode.png
│       ├── ftv2splitbar.png
│       ├── ftv2vertline.png
│       ├── functions.html
│       ├── functions_func.html
│       ├── functions_vars.html
│       ├── globals.html
│       ├── globals_defs.html
│       ├── globals_enum.html
│       ├── globals_eval.html
│       ├── globals_func.html
│       ├── globals_type.html
│       ├── index.html
│       ├── namespaceCaboCha.html
│       ├── namespacemembers.html
│       ├── namespacemembers_enum.html
│       ├── namespacemembers_eval.html
│       ├── namespacemembers_func.html
│       ├── namespacemembers_type.html
│       ├── namespaces.html
│       ├── nav_f.png
│       ├── nav_g.png
│       ├── nav_h.png
│       ├── open.png
│       ├── structcabocha__chunk__t-members.html
│       ├── structcabocha__chunk__t.html
│       ├── structcabocha__token__t-members.html
│       ├── structcabocha__token__t.html
│       ├── sync_off.png
│       ├── sync_on.png
│       ├── tab_a.png
│       ├── tab_b.gif
│       ├── tab_b.png
│       ├── tab_h.png
│       ├── tab_l.gif
│       ├── tab_r.gif
│       ├── tab_s.png
│       └── tabs.css
├── example
│   ├── example.c
│   └── example2.cpp
├── install-sh
├── java
│   ├── CaboCha_wrap.cxx
│   ├── Makefile
│   ├── org
│   │   └── chasen
│   │       └── cabocha
│   │           ├── CaboCha.java
│   │           ├── CaboChaConstants.java
│   │           ├── CaboChaJNI.java
│   │           ├── CharsetType.java
│   │           ├── Chunk.java
│   │           ├── FormatType.java
│   │           ├── InputLayerType.java
│   │           ├── OutputLayerType.java
│   │           ├── Parser.java
│   │           ├── ParserType.java
│   │           ├── ParsingAlgorithm.java
│   │           ├── PossetType.java
│   │           ├── Token.java
│   │           └── Tree.java
│   └── test.java
├── ltmain.sh
├── man
│   ├── Makefile.am
│   ├── Makefile.in
│   └── cabocha.1
├── missing
├── model
│   ├── Makefile.am
│   ├── Makefile.in
│   ├── chunk.ipa.txt
│   ├── chunk.juman.txt
│   ├── chunk.unidic.txt
│   ├── dep.ipa.txt
│   ├── dep.juman.txt
│   ├── dep.unidic.txt
│   ├── ne.ipa.txt
│   ├── ne.juman.txt
│   └── ne.unidic.txt
├── perl
│   ├── CaboCha.bs
│   ├── CaboCha.pm
│   ├── CaboCha_wrap.cxx
│   ├── CaboCha_wrap.o
│   ├── MYMETA.yml
│   ├── Makefile.PL
│   ├── blib
│   │   ├── arch
│   │   │   └── auto
│   │   │       └── CaboCha
│   │   │           ├── CaboCha.bs
│   │   │           └── CaboCha.so
│   │   ├── bin
│   │   ├── lib
│   │   │   ├── CaboCha.pm
│   │   │   └── auto
│   │   │       └── CaboCha
│   │   ├── man1
│   │   ├── man3
│   │   └── script
│   ├── pm_to_blib
│   └── test.pl
├── python
│   ├── CaboCha.py
│   ├── CaboCha_wrap.cxx
│   ├── setup.py
│   └── test.py
├── ruby
│   ├── CaboCha_wrap.cpp
│   ├── extconf.rb
│   └── test.rb
├── src
│   ├── Makefile.am
│   ├── Makefile.in
│   ├── Makefile.msvc.in
│   ├── analyzer.h
│   ├── cabocha-learn.cpp
│   ├── cabocha-model-index.cpp
│   ├── cabocha-system-eval.cpp
│   ├── cabocha.cpp
│   ├── cabocha.h
│   ├── char_category.h
│   ├── chunk_learner.cpp
│   ├── chunker.cpp
│   ├── chunker.h
│   ├── common.h
│   ├── darts.h
│   ├── dep.cpp
│   ├── dep.h
│   ├── dep_learner.cpp
│   ├── eval.cpp
│   ├── freelist.h
│   ├── learner.cpp
│   ├── libcabocha.cpp
│   ├── make.bat
│   ├── mmap.h
│   ├── morph.cpp
│   ├── morph.h
│   ├── ne.cpp
│   ├── ne.h
│   ├── normalizer.cpp
│   ├── normalizer.h
│   ├── normalizer.rule
│   ├── normalizer_rule.h
│   ├── normalizer_rule.sh
│   ├── param.cpp
│   ├── param.h
│   ├── parser.cpp
│   ├── scoped_ptr.h
│   ├── selector.cpp
│   ├── selector.h
│   ├── selector_pat.h
│   ├── stream_wrapper.h
│   ├── string_buffer.cpp
│   ├── string_buffer.h
│   ├── svm.cpp
│   ├── svm.h
│   ├── svm_learn.cpp
│   ├── svm_learn.h
│   ├── timer.h
│   ├── tree.cpp
│   ├── tree_allocator.cpp
│   ├── tree_allocator.h
│   ├── ucs.cpp
│   ├── ucs.h
│   ├── ucstable.h
│   ├── utils.cpp
│   ├── utils.h
│   └── winmain.h
├── swig
│   ├── CaboCha.i
│   ├── Makefile
│   ├── version.h
│   └── version.h.in
└── tools
    ├── KNBC2KC.pl
    ├── KyotoCorpus.pm
    ├── chasen2mecab.pl
    ├── irex2cabocha.pl
    ├── kc2cabocha.pl
    └── kc2juman.pl

26 directories, 212 files
cabocha-0.69 %

【ログ】macOS Big Sur 11.2.2: brew install cabocha

  • 実行コマンド:brew install cabocha
  • 実行日:2021/05/29
  • 実行環境:macOS Big Sur 11.2.2
% brew install cabocha
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> Updated Formulae
Updated 1 formula.

==> Downloading https://ghcr.io/v2/homebrew/core/cabocha/manifests/0.69-1
######################################################################## 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/cabocha/blobs/sha256:1dd5c1474946aaab675326323c8f7e3d101687b50d5542464558f54a8c477cc8
==> Downloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:1dd5c1474946aaab675326323c8f7e3d101687b50d5542464558f54a8c477cc8?se=2021-05-28T21%3A35%3A00Z&sig=LU2t3QBPVMTA
######################################################################## 100.0%
==> Pouring cabocha--0.69.big_sur.bottle.1.tar.gz
🍺  /usr/local/Cellar/cabocha/0.69: 28 files, 236.2MB
% 

【ログ】macOS Big Sur 11.2.2: brew install crf++

  • 実行コマンド:brew install crf++
  • 実行日:2021/05/29
  • 実行環境:macOS Big Sur 11.2.2
% brew install crf++
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/core and homebrew/cask).
==> New Formulae
caire                                  cidr2range                             qthreads                               range2cidr                             universal-ctags
==> Updated Formulae
Updated 344 formulae.
==> Renamed Formulae
badtouch -> authoscope
==> New Casks
assinador-serpro                                 dmidiplayer                                      futurerestore-gui                                hightop
==> Updated Casks
Updated 197 casks.

==> Downloading https://ghcr.io/v2/homebrew/core/crfxx/manifests/0.58-3
######################################################################## 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/crfxx/blobs/sha256:fcf0862271c392bc7b69a4e02a74dd9bd85615b6be0273009e7611bb78298f61
==> Downloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:fcf0862271c392bc7b69a4e02a74dd9bd85615b6be0273009e7611bb78298f61?se=2021-05-28T21%3A25%3A00Z&sig=fBS3Lw84FQ6O
######################################################################## 100.0%
==> Pouring crf++--0.58.big_sur.bottle.3.tar.gz
🍺  /usr/local/Cellar/crf++/0.58: 13 files, 765.2KB
% 

【Mac】Python の MeCab で YouTube コメントを 形態素解析にかける

せっかく YouTube Data API でコメントを抽出できるので、今回は YouTube のコメント抽出の方法の紹介とともに、コメント欄にどんな単語が出てくるのか Python の MeCab で形態素解析をしたいと思います。

とりあえず Mac ローカル環境で触りだけやってみます。

  1. YouTube Data API でコメント情報を抽出
  2. YouTube 動画のコメントを MeCab で処理する

ちなみに API キーの取得やライブラリのインストールがまだの場合は下記記事をどうぞ。

1. YouTube Data API でコメント情報を抽出

まずはコメント情報を抽出するところから。コメント関連では CommentsThreads と Comments の二つがあり、両方とも JSON 形式で取得する事ができます。

CommentThreads

まず、CommentThreads では動画 ID やチャンネル ID をもとに、それらの ID に紐づくコメントを抽出することができます。

例えば動画 ID 「fdsaZ8EMR2U」のコメント 5 件のデータを取る場合はこんな感じ。

# -*- coding: utf-8 -*-

# Sample Python code for youtube.commentThreads.list
# See instructions for running these code samples locally:
# https://developers.google.com/explorer-help/guides/code_samples#python

import os

import googleapiclient.discovery

def main():
    # Disable OAuthlib's HTTPS verification when running locally.
    # *DO NOT* leave this option enabled in production.
    os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"

    api_service_name = "youtube"
    api_version = "v3"
    DEVELOPER_KEY = "YOUR_API_KEY"

    youtube = googleapiclient.discovery.build(
        api_service_name, api_version, developerKey = DEVELOPER_KEY)

    request = youtube.commentThreads().list(
        part="id,replies,snippet",
        maxResults=5,
        videoId="fdsaZ8EMR2U"
    )
    response = request.execute()

    print(response)

if __name__ == "__main__":
    main()

下記の様な JSON が返ってきます。コメントの内容、投稿主のチャンネル名などが含まれています。そして「replies」にはコメントに対する返信も含まれます。

{
  "kind": "youtube#commentThreadListResponse",
  "etag": "tSw5WSiFS4IMMytcgoYXJ9zpu6I",
  "nextPageToken": "QURTSl9pMDFKVnZtTFl0NFZOdnhaZFpXaFBOcWU5aDA0QWM5bDVpYk5oVTd1WDQwSDY1cU11OVBOZHNnWFNOTmNJby1Db1JpWno2Qnd5bw==",
  "pageInfo": {
    "totalResults": 5,
    "resultsPerPage": 5
  },
  "items": [
    {
      "kind": "youtube#commentThread",
      "etag": "An3zz04lgE7jUVO7VXXmqfdwFjk",
      "id": "UgwDY44NUll4uiZXqqx4AaABAg",
      "snippet": {
        "videoId": "fdsaZ8EMR2U",
        "topLevelComment": {
          "kind": "youtube#comment",
          "etag": "Kty1w2F4dTbXmakl-ywdK28vLEg",
          "id": "UgwDY44NUll4uiZXqqx4AaABAg",
          "snippet": {
            "videoId": "fdsaZ8EMR2U",
            "textDisplay": "SO gooood",
            "textOriginal": "SO gooood",
            "authorDisplayName": "fatt musiek",
            "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwngpQ-20jVq0c-9aC-wDJ87aTKi2QvPLTRN2GXGRaw=s48-c-k-c0x00ffffff-no-rj",
            "authorChannelUrl": "http://www.youtube.com/channel/UCl3ha3zwY9p6CemIZZXIdXQ",
            "authorChannelId": {
              "value": "UCl3ha3zwY9p6CemIZZXIdXQ"
            },
            "canRate": true,
            "viewerRating": "none",
            "likeCount": 0,
            "publishedAt": "2021-05-22T18:48:34Z",
            "updatedAt": "2021-05-22T18:48:34Z"
          }
        },
        "canReply": true,
        "totalReplyCount": 0,
        "isPublic": true
      }
    },
    {
      "kind": "youtube#commentThread",
      "etag": "QVJH5RHTNij1fN5jRj_mNcDscHA",
      "id": "Ugx8sUuwqKqPVG9eSuJ4AaABAg",
      "snippet": {
        "videoId": "fdsaZ8EMR2U",
        "topLevelComment": {
          "kind": "youtube#comment",
          "etag": "Y_SsBrGGxQoLpcztQqND9wGarUc",
          "id": "Ugx8sUuwqKqPVG9eSuJ4AaABAg",
          "snippet": {
            "videoId": "fdsaZ8EMR2U",
            "textDisplay": "so what if really yuffie have met johnny hehe",
            "textOriginal": "so what if really yuffie have met johnny hehe",
            "authorDisplayName": "GregOrio Barachina",
            "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwnjgJE6zBYksYQWt8TmKlMDYOyG0t-BHPNWWmvUUPQ=s48-c-k-c0x00ffffff-no-rj",
            "authorChannelUrl": "http://www.youtube.com/channel/UCUs2OJ4-KqYGS2EPJCDj7tQ",
            "authorChannelId": {
              "value": "UCUs2OJ4-KqYGS2EPJCDj7tQ"
            },
            "canRate": true,
            "viewerRating": "none",
            "likeCount": 0,
            "publishedAt": "2021-05-21T13:42:45Z",
            "updatedAt": "2021-05-21T13:42:45Z"
          }
        },
        "canReply": true,
        "totalReplyCount": 0,
        "isPublic": true
      }
    },
    {
      "kind": "youtube#commentThread",
      "etag": "ggLtp9jtNyrqb3JSvzkvUDon7gg",
      "id": "UgwP-4ucsrWh_iXJQMN4AaABAg",
      "snippet": {
        "videoId": "fdsaZ8EMR2U",
        "topLevelComment": {
          "kind": "youtube#comment",
          "etag": "Raxyf3_Zw3ksZyGeWDt7_HHW8SA",
          "id": "UgwP-4ucsrWh_iXJQMN4AaABAg",
          "snippet": {
            "videoId": "fdsaZ8EMR2U",
            "textDisplay": "The Aerith and Cloud scene is much more meaningful than the one with Tifa. Still a good scene but come on...Aerith just appearing amongst the flowers and getting to see her again...priceless",
            "textOriginal": "The Aerith and Cloud scene is much more meaningful than the one with Tifa. Still a good scene but come on...Aerith just appearing amongst the flowers and getting to see her again...priceless",
            "authorDisplayName": "Maxx Doran",
            "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwnixfMDBxLt_TfUEjlpHhU-OvwE1vjCgpFBAVIMxjg=s48-c-k-c0x00ffffff-no-rj",
            "authorChannelUrl": "http://www.youtube.com/channel/UCXcLTX_9fNHLVAMr_plxeqQ",
            "authorChannelId": {
              "value": "UCXcLTX_9fNHLVAMr_plxeqQ"
            },
            "canRate": true,
            "viewerRating": "none",
            "likeCount": 0,
            "publishedAt": "2021-05-18T01:50:04Z",
            "updatedAt": "2021-05-18T01:50:04Z"
          }
        },
        "canReply": true,
        "totalReplyCount": 0,
        "isPublic": true
      }
    },
    {
      "kind": "youtube#commentThread",
      "etag": "ptUVfOGBkZDnUXKFoDaGnQ9Y-gw",
      "id": "UgzTN_4ek7syWNNbCrB4AaABAg",
      "snippet": {
        "videoId": "fdsaZ8EMR2U",
        "topLevelComment": {
          "kind": "youtube#comment",
          "etag": "L3vDroBKOAklgIqKSkcX_JvLn_g",
          "id": "UgzTN_4ek7syWNNbCrB4AaABAg",
          "snippet": {
            "videoId": "fdsaZ8EMR2U",
            "textDisplay": "You caught on to the magnify barrier idea so early. I was part way into hard mode before I thought of that.",
            "textOriginal": "You caught on to the magnify barrier idea so early. I was part way into hard mode before I thought of that.",
            "authorDisplayName": "Justin Edwards",
            "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwngz1mU5zD3QHSRVU3jXTEZApnkYsmAzCKFXxUyD1w=s48-c-k-c0x00ffffff-no-rj",
            "authorChannelUrl": "http://www.youtube.com/channel/UCO-oPQJCpNw87M6YbcuuFMw",
            "authorChannelId": {
              "value": "UCO-oPQJCpNw87M6YbcuuFMw"
            },
            "canRate": true,
            "viewerRating": "none",
            "likeCount": 0,
            "publishedAt": "2021-05-10T07:25:36Z",
            "updatedAt": "2021-05-10T07:25:36Z"
          }
        },
        "canReply": true,
        "totalReplyCount": 0,
        "isPublic": true
      }
    },
    {
      "kind": "youtube#commentThread",
      "etag": "vfaqu09YbjpC_akz6riq0_XpSCw",
      "id": "UgygOOysmSAraKnx81h4AaABAg",
      "snippet": {
        "videoId": "fdsaZ8EMR2U",
        "topLevelComment": {
          "kind": "youtube#comment",
          "etag": "k8vIS0anrGkqCcFfyj0gnrUwXQI",
          "id": "UgygOOysmSAraKnx81h4AaABAg",
          "snippet": {
            "videoId": "fdsaZ8EMR2U",
            "textDisplay": "Y'know Max, you COULD have just run 5k steps in Aerith's garden, checked what the Materia did, and moved on. Or, maybe, look up an online guide, since by now I'm sure SOMEONE has posted one.",
            "textOriginal": "Y'know Max, you COULD have just run 5k steps in Aerith's garden, checked what the Materia did, and moved on. Or, maybe, look up an online guide, since by now I'm sure SOMEONE has posted one.",
            "authorDisplayName": "Soma Cruz the Demigod of Balance",
            "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwnilg2dkOBvJqeTbW34CBoxURHLWv78fnbCRkArv=s48-c-k-c0x00ffffff-no-rj",
            "authorChannelUrl": "http://www.youtube.com/channel/UCNaiemmWvNbzfaQm5e3hyqA",
            "authorChannelId": {
              "value": "UCNaiemmWvNbzfaQm5e3hyqA"
            },
            "canRate": true,
            "viewerRating": "none",
            "likeCount": 0,
            "publishedAt": "2021-05-07T02:21:29Z",
            "updatedAt": "2021-05-07T02:21:29Z"
          }
        },
        "canReply": true,
        "totalReplyCount": 1,
        "isPublic": true
      },
      "replies": {
        "comments": [
          {
            "kind": "youtube#comment",
            "etag": "oMSJ1drDJreTmguX72NWydzfbcY",
            "id": "UgygOOysmSAraKnx81h4AaABAg.9N10GKEX1209N2h_vQGc9Y",
            "snippet": {
              "videoId": "fdsaZ8EMR2U",
              "textDisplay": "\u003ca href=\"https://www.youtube.com/watch?v=fdsaZ8EMR2U&t=38m03s\"\u003e38:03\u003c/a\u003e These things don't stagger? But each time they clone, they lose health, and the clones are much weaker. Damn, I see why you had trouble with these, Max.",
              "textOriginal": "38:03 These things don't stagger? But each time they clone, they lose health, and the clones are much weaker. Damn, I see why you had trouble with these, Max.",
              "parentId": "UgygOOysmSAraKnx81h4AaABAg",
              "authorDisplayName": "Soma Cruz the Demigod of Balance",
              "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwnilg2dkOBvJqeTbW34CBoxURHLWv78fnbCRkArv=s48-c-k-c0x00ffffff-no-rj",
              "authorChannelUrl": "http://www.youtube.com/channel/UCNaiemmWvNbzfaQm5e3hyqA",
              "authorChannelId": {
                "value": "UCNaiemmWvNbzfaQm5e3hyqA"
              },
              "canRate": true,
              "viewerRating": "none",
              "likeCount": 0,
              "publishedAt": "2021-05-07T18:08:01Z",
              "updatedAt": "2021-05-07T18:08:01Z"
            }
          }
        ]
      }
    }
  ]
}

Comments

Comments ではコメント ID を直接指定してデータを取得します。上の例で取得した 5 つのコメント ID を「id」に指定してデータを取得してみます。

# -*- coding: utf-8 -*-

# Sample Python code for youtube.comments.list
# See instructions for running these code samples locally:
# https://developers.google.com/explorer-help/guides/code_samples#python

import os

import googleapiclient.discovery

def main():
    # Disable OAuthlib's HTTPS verification when running locally.
    # *DO NOT* leave this option enabled in production.
    os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"

    api_service_name = "youtube"
    api_version = "v3"
    DEVELOPER_KEY = "YOUR_API_KEY"

    youtube = googleapiclient.discovery.build(
        api_service_name, api_version, developerKey = DEVELOPER_KEY)

    request = youtube.comments().list(
        part="id,snippet",
        id="UgwDY44NUll4uiZXqqx4AaABAg,Ugx8sUuwqKqPVG9eSuJ4AaABAg,UgwP-4ucsrWh_iXJQMN4AaABAg,UgzTN_4ek7syWNNbCrB4AaABAg,UgygOOysmSAraKnx81h4AaABAg"
    )
    response = request.execute()

    print(response)

if __name__ == "__main__":
    main()

下記の様な JSON が返ってきます。CommentThreads の方ではコメントが投稿された動画の ID や、コメントに対する返信も含まれていましたが、Comments の方には含まれません。

{
  "kind": "youtube#commentListResponse",
  "etag": "vwaB3KAa_Snb_GuTkkMYrlL7Jrg",
  "items": [
    {
      "kind": "youtube#comment",
      "etag": "E9ovRZPTGOUQzHb0AEiKA26EJxY",
      "id": "UgwDY44NUll4uiZXqqx4AaABAg",
      "snippet": {
        "textDisplay": "SO gooood",
        "textOriginal": "SO gooood",
        "authorDisplayName": "fatt musiek",
        "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwngpQ-20jVq0c-9aC-wDJ87aTKi2QvPLTRN2GXGRaw=s48-c-k-c0x00ffffff-no-rj",
        "authorChannelUrl": "http://www.youtube.com/channel/UCl3ha3zwY9p6CemIZZXIdXQ",
        "authorChannelId": {
          "value": "UCl3ha3zwY9p6CemIZZXIdXQ"
        },
        "canRate": true,
        "viewerRating": "none",
        "likeCount": 0,
        "publishedAt": "2021-05-22T18:48:34Z",
        "updatedAt": "2021-05-22T18:48:34Z"
      }
    },
    {
      "kind": "youtube#comment",
      "etag": "FG5oWvmMF39kDNl_rlnzb0bsSWM",
      "id": "Ugx8sUuwqKqPVG9eSuJ4AaABAg",
      "snippet": {
        "textDisplay": "so what if really yuffie have met johnny hehe",
        "textOriginal": "so what if really yuffie have met johnny hehe",
        "authorDisplayName": "GregOrio Barachina",
        "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwnjgJE6zBYksYQWt8TmKlMDYOyG0t-BHPNWWmvUUPQ=s48-c-k-c0x00ffffff-no-rj",
        "authorChannelUrl": "http://www.youtube.com/channel/UCUs2OJ4-KqYGS2EPJCDj7tQ",
        "authorChannelId": {
          "value": "UCUs2OJ4-KqYGS2EPJCDj7tQ"
        },
        "canRate": true,
        "viewerRating": "none",
        "likeCount": 0,
        "publishedAt": "2021-05-21T13:42:45Z",
        "updatedAt": "2021-05-21T13:42:45Z"
      }
    },
    {
      "kind": "youtube#comment",
      "etag": "WQw2UyILXAhkYOAl-AKScZi1pCY",
      "id": "UgwP-4ucsrWh_iXJQMN4AaABAg",
      "snippet": {
        "textDisplay": "The Aerith and Cloud scene is much more meaningful than the one with Tifa. Still a good scene but come on...Aerith just appearing amongst the flowers and getting to see her again...priceless",
        "textOriginal": "The Aerith and Cloud scene is much more meaningful than the one with Tifa. Still a good scene but come on...Aerith just appearing amongst the flowers and getting to see her again...priceless",
        "authorDisplayName": "Maxx Doran",
        "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwnixfMDBxLt_TfUEjlpHhU-OvwE1vjCgpFBAVIMxjg=s48-c-k-c0x00ffffff-no-rj",
        "authorChannelUrl": "http://www.youtube.com/channel/UCXcLTX_9fNHLVAMr_plxeqQ",
        "authorChannelId": {
          "value": "UCXcLTX_9fNHLVAMr_plxeqQ"
        },
        "canRate": true,
        "viewerRating": "none",
        "likeCount": 0,
        "publishedAt": "2021-05-18T01:50:04Z",
        "updatedAt": "2021-05-18T01:50:04Z"
      }
    },
    {
      "kind": "youtube#comment",
      "etag": "Vchl7kutnRgZb-uKYjMNMrJQ2qQ",
      "id": "UgzTN_4ek7syWNNbCrB4AaABAg",
      "snippet": {
        "textDisplay": "You caught on to the magnify barrier idea so early. I was part way into hard mode before I thought of that.",
        "textOriginal": "You caught on to the magnify barrier idea so early. I was part way into hard mode before I thought of that.",
        "authorDisplayName": "Justin Edwards",
        "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwngz1mU5zD3QHSRVU3jXTEZApnkYsmAzCKFXxUyD1w=s48-c-k-c0x00ffffff-no-rj",
        "authorChannelUrl": "http://www.youtube.com/channel/UCO-oPQJCpNw87M6YbcuuFMw",
        "authorChannelId": {
          "value": "UCO-oPQJCpNw87M6YbcuuFMw"
        },
        "canRate": true,
        "viewerRating": "none",
        "likeCount": 0,
        "publishedAt": "2021-05-10T07:25:36Z",
        "updatedAt": "2021-05-10T07:25:36Z"
      }
    },
    {
      "kind": "youtube#comment",
      "etag": "pbfhdpIgB5QCKnr4Inkm_U2wbjQ",
      "id": "UgygOOysmSAraKnx81h4AaABAg",
      "snippet": {
        "textDisplay": "Y'know Max, you COULD have just run 5k steps in Aerith's garden, checked what the Materia did, and moved on. Or, maybe, look up an online guide, since by now I'm sure SOMEONE has posted one.",
        "textOriginal": "Y'know Max, you COULD have just run 5k steps in Aerith's garden, checked what the Materia did, and moved on. Or, maybe, look up an online guide, since by now I'm sure SOMEONE has posted one.",
        "authorDisplayName": "Soma Cruz the Demigod of Balance",
        "authorProfileImageUrl": "https://yt3.ggpht.com/ytc/AAUvwnilg2dkOBvJqeTbW34CBoxURHLWv78fnbCRkArv=s48-c-k-c0x00ffffff-no-rj",
        "authorChannelUrl": "http://www.youtube.com/channel/UCNaiemmWvNbzfaQm5e3hyqA",
        "authorChannelId": {
          "value": "UCNaiemmWvNbzfaQm5e3hyqA"
        },
        "canRate": true,
        "viewerRating": "none",
        "likeCount": 0,
        "publishedAt": "2021-05-07T02:21:29Z",
        "updatedAt": "2021-05-07T02:21:29Z"
      }
    }
  ]
}

2. YouTube 動画のコメントを MeCab で処理する

次に、抽出したコメントに対して形態素解析をかけたいと思います。

まず仮想環境で下記を実行し、MeCab を使える様準備します。

$ brew install mecab
$ brew install mecab-ipadic
$ pip install mecab-python3

基本編:単一コメント取得〜形態素解析

MeCab の準備が済んだら下記の様に

import os
import googleapiclient.discovery
import MeCab

def main():
    # Disable OAuthlib's HTTPS verification when running locally.
    # *DO NOT* leave this option enabled in production.
    os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"

    api_service_name = "youtube"
    api_version = "v3"
    DEVELOPER_KEY = "YOUR_API_KEY"

    youtube = googleapiclient.discovery.build(
        api_service_name, api_version, developerKey = DEVELOPER_KEY)

    # コメント ID を指定してコメントを取得
    request = youtube.comments().list(
        part="id,snippet",
        id="UgzKZhtc3fX6JgX6p5p4AaABAg" # コメント ID 
    )
    response = request.execute()

    # 返された JSON からコメントの文章を取得し text に保存
    text = response['items'][0]['snippet']['textOriginal']
    
    m = MeCab.Tagger()
 
    node = m.parseToNode(text)
    text_after = []
    while node:
        words.append(node.surface)
        node = node.next

    print('処理前: '+str(text)) # MeCab 処理前の text
    print('処理後: '+str(text_after)) # MeCab 処理を行なった後の text
    
if __name__ == "__main__":
    main()

狩野英孝さんの動画のコメントで、上記のコードを実行すると下記の結果を返してくれます。

処理前: ここ最近英孝ちゃんの動画みてたらいつの間にか日付け変わってるんだけど笑
処理前: ['', 'ここ', '最近', '英孝', 'ちゃん', 'の', '動画', 'み', 'て', 'たら', 'いつの間にか', '日', '付け', '変わっ', 'てる', 'ん', 'だ', 'けど', '笑', '']

応用編:複数コメント取得〜頻出単語の表出

YouTube Data API でコメントスレッドを取得し、それを MeCab の処理に欠けて出現数の多い単語を抽出してみます。

import os
import googleapiclient.discovery
import MeCab
import collections

def main():
    # Disable OAuthlib's HTTPS verification when running locally.
    # *DO NOT* leave this option enabled in production.
    os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"

    api_service_name = "youtube"
    api_version = "v3"
    DEVELOPER_KEY = "YOUR_API_KEY"

    youtube = googleapiclient.discovery.build(
        api_service_name, api_version, developerKey = DEVELOPER_KEY)

    # コメントスレッドの取得
    request = youtube.commentThreads().list(
        part="id,replies,snippet",
        maxResults=100, # 最大取得コメントスレッド数
        videoId="jsRR_ZimvAo", # 動画 ID
        order="relevance" # 関連性の高い順にコメントスレッドを取得
    )
    response = request.execute()

    comment_list = []
    for item in response['items']:

        comment_list.append(item['snippet']['topLevelComment']['snippet']['textOriginal'])
        if 'replies' in item.keys():
            for reply in item['replies']['comments']:

                comment_list.append(reply['snippet']['textOriginal'])
    
    # MeCab の処理
    comment_particles = []
    for comment in comment_list:
        m = MeCab.Tagger()
    
        node = m.parseToNode(comment)
        while node:
            if len(node.surface) > 0: # ''は処理から除外
                hinshi = node.feature.split(',')[0]
                if hinshi in ['名詞','形容詞']: # 名詞か形容詞に絞る
                    comment_particles.append(node.surface)
            
            node = node.next

    c = collections.Counter(comment_particles)
    
    # 出現数順に print
    for i in c.most_common(30):
        print(i)


if __name__ == "__main__":
    main()

お笑い芸人のさまぁ〜ずとぺこぱの動画なので下記の様な感じでタプルが返ってきます。

('シュウ', 23)
('ペイ', 23)
('さん', 21)
('笑', 17)
('好き', 11)
('ちゃん', 10)
('シュウペイ', 9)
('の', 9)
('ん', 9)
('ぺこぱ', 9)
('企画', 9)
('お前', 9)
('ツッコミ', 8)
('~', 8)
('寺', 7)
('かわいい', 7)
('最高', 7)
('松陰', 6)
('ショック', 6)
('回', 6)
('純粋', 6)
('人', 6)
('俺', 6)
('いい', 5)
('www', 5)
('w', 5)
('面白い', 5)
('ww', 5)
('❤', 5)
('ずさん', 4)

不自然なところで単語が切れてしまったりするのが気になりますが、とりあえずこんな風なことができるということで!

サーチコンソールの「URL が Google に認識されていません」を robots.txt の作成で解決

エラー「インデックス登録リクエストに失敗しました」

今回の問題は Google のサーチコンソールで「インデックス登録をリクエスト」をクリックした際に「インデックス登録リクエストに失敗しました」と表示され、下記の症状が出るというもの。

  • ステータスは「URL が Google に登録されていません」と表示される。
  • カバレッジは「URL が Google に認識されていません」と表示され具体的な原因がわからない。

解決方法

色々調べてみたところ robots.txt を作成する必要がある様です。

robots.txt というと、通常はサイトに対するクロールを拒否する目的で設定することが多そうですが、少なくとも Google サーチコンソールではクロールして欲しい場合でも作成し、明示的に許可する必要がある様です。

robots.txt の作成

robots.txt のファイルの内容としてはとりあえず下記にしておけば大丈夫そうです。

User-agent: *
Allow: /

Nginx での robots.txt 設定

ここからは Nginx で robots.txt を設定する方法ですが、robots.txt をルートディレクトリに配置し、設定ファイルに下記の形で記述します。

location = /robots.txt {
        alias /ルートディレクトリのパス/robots.txt;
}

URL に「/robots.txt」が含まれる場合にサーバー上の robots.txt を参照するという意味です。

ブラウザの URL で「ドメイン名/robots.txt」を叩いた時に下記の様に表示されれば大丈夫なはずです。

ファイルの変更の反映

Nginx の設定ファイルを色々いじって、Nginx を restart してもうまく反映されない様な場合は、Chrome のシークレットモードやブラウザのキャッシュを削除して再度試してみると反映されたりします。

robots.txt テスターで robots.txt ファイルを Google へ送信

過去に Google のクロールを受け付けない形の robots.txt を Google 側で認識してしまっている場合、その後ファイルを編集してインデックス登録のリクエストを行っても引き続きエラーが出てしまいます。

この場合、robots.txt テスターから編集後の robots.txt を Google に送信し、認識してもらう必要があります。

Google Search Console - robots.txt テスター

注意点

Google サーチコンソールでドメインプロパティの登録しかない場合、対象プロパティを選択する際にサイトが表示されず robots.txt テスターを使うことができません。この場合一度 URL プレフィックスプロパティも登録する必要があります。

【ログ】Ubuntu 20.04: sudo certbot --nginx -d example.com -d www.example.com

  • 実行コマンド:sudo certbot --nginx -d example.com -d www.example.com
  • 実行日:2021/05/09
  • 実行環境:Ubuntu 20.04
$ sudo certbot --nginx -d example.com -d www.example.com
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator nginx, Installer nginx
Enter email address (used for urgent renewal and security notices) (Enter 'c' to
cancel): example@gmail.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please read the Terms of Service at
https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf. You must
agree in order to register with the ACME server at
https://acme-v02.api.letsencrypt.org/directory
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(A)gree/(C)ancel: A

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Would you be willing to share your email address with the Electronic Frontier
Foundation, a founding partner of the Let's Encrypt project and the non-profit
organization that develops Certbot? We'd like to send you email about our work
encrypting the web, EFF news, campaigns, and ways to support digital freedom.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: N
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for graffuhs.com
http-01 challenge for www.graffuhs.com
Waiting for verification...
Cleaning up challenges
Deploying Certificate to VirtualHost /etc/nginx/sites-enabled/graffuhs.com
Deploying Certificate to VirtualHost /etc/nginx/sites-enabled/graffuhs.com

Please choose whether or not to redirect HTTP traffic to HTTPS, removing HTTP access.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1: No redirect - Make no further changes to the webserver configuration.
2: Redirect - Make all requests redirect to secure HTTPS access. Choose this for
new sites, or if you're confident your site works on HTTPS. You can undo this
change by editing your web server's configuration.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 2
Redirecting all traffic on port 80 to ssl in /etc/nginx/sites-enabled/graffuhs.com
Redirecting all traffic on port 80 to ssl in /etc/nginx/sites-enabled/graffuhs.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Congratulations! You have successfully enabled https://graffuhs.com and
https://www.graffuhs.com

You should test your configuration at:
https://www.ssllabs.com/ssltest/analyze.html?d=graffuhs.com
https://www.ssllabs.com/ssltest/analyze.html?d=www.graffuhs.com
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/graffuhs.com/fullchain.pem
   Your key file has been saved at:
   /etc/letsencrypt/live/graffuhs.com/privkey.pem
   Your cert will expire on 2021-08-07. To obtain a new or tweaked
   version of this certificate in the future, simply run certbot again
   with the "certonly" option. To non-interactively renew *all* of
   your certificates, run "certbot renew"
 - Your account credentials have been saved in your Certbot
   configuration directory at /etc/letsencrypt. You should make a
   secure backup of this folder now. This configuration directory will
   also contain certificates and private keys obtained by Certbot so
   making regular backups of this folder is ideal.
 - If you like Certbot, please consider supporting our work by:

   Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
   Donating to EFF:                    https://eff.org/donate-le