Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

半自動化處理異讀音 #32

Open
lotem opened this issue Feb 27, 2021 · 0 comments
Open

半自動化處理異讀音 #32

lotem opened this issue Feb 27, 2021 · 0 comments

Comments

@lotem
Copy link
Member

lotem commented Feb 27, 2021

單字的異讀音基本齊全。問題主要存在於包含這些單字的詞組。
需要腳本輔助發現既存的這類情況再做考察。

每加入一批數據都應當對包含多音字特別是異讀音進行干預。現存的問題源於之前導入數據時沒有做這項處理。

導入其他詞庫如果不經人工干預:無法保證每個數據數據源包含的詞條完全「對齊」,即不多不少包含相同的詞條,否則比其他來源多出的詞仍會缺失該數據源未收錄的異讀音,也就會出現本帖發現的問題。

不僅要審查導入的詞條,還得審查原有詞條是否需要標記新產生的異讀音。比如添加了「擊」的異讀音 jí,則須補充標註原有的、含其他多音字的詞條「長擊」。

Originally posted by @lotem in #29 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant