
This tool makes learning to read Chinese easier by automatically
marking up the words in a simplified Chinese text with their pronunciations and
dictionary definitions. You can type or paste in GB-encoded text or the address
of a Chinese web page. You have several choices of how the text will be
annotated:
- Segment Only: In this option, the program
will add spaces between the words in the text. No other information is
added.
- Add Dictionary Entries at status line:
After segmenting the text, the program adds two kinds of ways of looking up
the word. First, when the user puts the mouse over an underlined word, its pronunciation
and definition will appear at the bottom of the browser, at the status line.
If the user actually clicks on the underlined word, it will take them to the
pronunciation and English definition as a footnote later on in the page.
- Add Dictionary Entries as footnotes: This
option is for users with older browsers that do not understand JavaScript.
Holding the mouse over a word will do nothing, but clicking it on will still
take the user to the definition in the footnotes. The file size is also
smaller than option 2.
- Convert to Pinyin: The program segments
the text and uses that information to convert the Chinese characters to
pinyin. Only the pinyin is shown in the results.
- Add pinyin next to characters: The pronunciation
of each character is indicated by adding pinyin by its side. No other
information is added.
- Add Pinyin above Characters: Add the pinyin above
the character, in an annotation style called the ruby. Currently only works
on Internet Explorer 5 (for other browsers the pinyin is placed next to the
character).
- Add Pinyin/Defs in Margins: Type a list of
words into "Words to Annotate" box, one word per line with no
spaces. Annotator will add definitions of these words to the right of the
paragraph the words occur in. If the words are not in the dictionary, you
can also include the definition besides the word, using the format described
below.
The user can add their own definition in this format:
Chinese [pinyin] /English definition/
That is, the Chinese (no internal spaces), followed by one
space (not a wide Chinese space), followed by the pinyin surrounded by square
brackets (with a space between each pinyin syllable), followed by another space,
followed the English definition/explanation surrounded by slashes (this is the
CEDICT format). One word or definition per line.
Users can use this to override the CEDICT definitions in the
other modes if the definitions or romanizations have mistakes.
When using "Add to Margins" the first time the word
occurs it will be in bold. Its definition will appear more or less to its right.
Right now it is set up to try match words to paragraphs, so be sure to have at
least one blank line between paragraphs.
Users can add words or definitions to the "Words to
Annotate" section to supplement or override the existing dictionary. Use
the CEDICT format for entering entries.
The tool currently only handles GB-encoded text. Dictionary
definitions are drawn from Paul Denisowski's CEDICT
Chinese-English dictionary. If you find a word that does not have a definition,
consider contributing it to the CEDICT project. The segmentation algorithm is
still under development. Just what constitutes a "proper" Chinese word
is also a good research topic. You can download
the segmenter code (in perl) and run it yourself.

This page is a mirror of the annotator formerly
available at Erik Peterson's On-line
Chinese Tools site.