Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

補充兩岸讀音相異的單字 #38

Closed
wants to merge 5 commits into from
Closed

補充兩岸讀音相異的單字 #38

wants to merge 5 commits into from

Conversation

dqwyy
Copy link

@dqwyy dqwyy commented Sep 11, 2021

見issue #37

@dqwyy dqwyy changed the title 補充兩岸讀音相異的單字 #37 補充兩岸讀音相異的單字 Sep 11, 2021
@dqwyy
Copy link
Author

dqwyy commented Sep 11, 2021

第二筆「小修改」本來想刪除打多出一行的危 wei2,以及改一下日期,結果新舊文件之間無法正常顯示出版本差異了,在下不會用git,勞煩管理員幫忙了

@LEOYoon-Tsaw
Copy link
Member

你可以撤銷(revert)第二次commit

This reverts commit c11a61e.
and update date
@dqwyy
Copy link
Author

dqwyy commented Sep 11, 2021

謝謝,我發現網頁版無法執行這個操作,下載了Windows版操作,應該是成功了

Copy link
Member

@lotem lotem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感謝拾柴。
需要關注對多音字權重的處理。
還一個盲區,只看代碼的差異不夠,需要在整個詞庫裏檢查:
如果添加字音後,某個字變成了多音字,這個字還出現在多音詞的注音裏面,則該詞的注音也需要增加標註。

舉一個例子。假設詞庫有

僞作	wei3 zuo4

因爲「作」是多音字,有 zuo1, zuo2, zuo4 等。
原有詞條的註音是爲了對「作」的讀音消歧,使程序不會自動推導出其他單字音的組合。
現在「僞」增加了一個常用讀音 wei4,需要同時定義

僞作	wei3 zuo4
僞作	wei4 zuo4

很遺憾,這種情況目前還需要人工審查保證改動完整。

@@ -21399,6 +21433,7 @@ min_phrase_weight: 100
緛 ruan4
緜 mian2
緝 ji1 50%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ji1qi4 爲常見讀音,權重應標記爲100%或各個讀音都不標記權重

@@ -14420,7 +14442,9 @@ min_phrase_weight: 100
昑 qin3
昒 hu1
易 yi4
昔 cuo4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

​這個讀音應標記爲罕用讀音,第三列設爲 0%

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

權重剛剛已經修改了,這方面我確實不太懂,感謝指點。

@dqwyy
Copy link
Author

dqwyy commented Sep 20, 2021

至於多音字消歧,我應該力不足了,一個一個弄工作量有點大 #32 不知道能否自動化

@lotem
Copy link
Member

lotem commented Sep 23, 2021

​感謝。
等我想想辦法。

@ShikiSuen
Copy link

ShikiSuen commented Nov 20, 2021

我剛剛新增了全字庫讀音支援:
#42

不過我也只是針對單字讀音而已。詞組讀音恐需要長期調整。
八股文會讓所有有問題的義項更容易破相。

@dqwyy dqwyy closed this Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants