]>
Commit | Line | Data |
---|---|---|
49dd4b93 | 1 | File README.DIC\r |
2 | To accompany the GNU version of the set of files (cide.*) containing \r | |
3 | the electronic version of the\r | |
4 | Collaborative International Dictionary of English.\r | |
5 | (called also GCIDE)\r | |
6 | These files contain Version 0.46 (January 2002)\r | |
7 | * * * * * * * * * * * * * * * * * * * * * * * * * * * *\r | |
8 | \r | |
9 | The dictionary was derived from the\r | |
10 | Webster's Revised Unabridged Dictionary\r | |
11 | Version published 1913\r | |
12 | by the C. & G. Merriam Co.\r | |
13 | Springfield, Mass.\r | |
14 | Under the direction of\r | |
15 | Noah Porter, D.D., LL.D.\r | |
16 | \r | |
17 | and has been supplemented with some of the definitions from\r | |
18 | WordNet, a semantic network created by\r | |
19 | the Cognitive Science Department\r | |
20 | of Princeton University\r | |
21 | under the direction of\r | |
22 | Prof. George Miller\r | |
23 | \r | |
24 | and is being proof-read and supplemented by volunteers from\r | |
25 | around the world. This is an unfunded project, and future\r | |
26 | enhancement of this dictionary will depend on the efforts of\r | |
27 | volunteers willing to help build this free resource into a\r | |
28 | comprehensive body of general information. New definitions\r | |
29 | for missing words or words senses and longer explanatory notes, \r | |
30 | as well as images to accompany the articles are needed. More\r | |
31 | modern illustrative quotations giving recent examples of\r | |
32 | usage of the words in their various senses will be very\r | |
33 | helpful, since most quotations in the original 1913 dictionary\r | |
34 | are now well over 100 years old.\r | |
35 | \r | |
36 | This electronic version is being maintained by World Soul,\r | |
37 | a non-profit organization in Plainfield, NJ. For additional\r | |
38 | information or if you are willing to assist construction of this\r | |
39 | data source, contact:\r | |
40 | \r | |
41 | =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\r | |
42 | Patrick J. Cassidy | TEL: (908) 561-3416\r | |
43 | World Soul | if no answer, (908) 668-5252\r | |
44 | 735 Belvidere Ave. | FAX: (908) 668-5904\r | |
45 | Plainfield, NJ 07062-2054\r | |
46 | pc@worldsoul.org or cassidy@micra.com\r | |
47 | =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\r | |
48 | \r | |
49 | * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * \r | |
50 | \r | |
51 | GCIDE is free software; you can redistribute it and/or modify\r | |
52 | it under the terms of the GNU General Public License as published by\r | |
53 | the Free Software Foundation; either version 2, or (at your option)\r | |
54 | any later version.\r | |
55 | \r | |
56 | GCIDE is distributed in the hope that it will be useful,\r | |
57 | but WITHOUT ANY WARRANTY; without even the implied warranty of\r | |
58 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\r | |
59 | GNU General Public License for more details.\r | |
60 | \r | |
61 | You should have received a copy of the GNU General Public License\r | |
62 | along with this copy of GCIDE; see the file COPYING. If not, write \r | |
63 | to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,\r | |
64 | Boston, MA 02111-1307, USA.\r | |
65 | * * * * * * * * * * * * * * * * * * * * * \r | |
66 | \r | |
67 | STRUCTURE OF THE DICTIONARY\r | |
68 | ---------------------------\r | |
69 | When the archives are unpacked, the main dictionary text of \r | |
70 | the GCIDE will be found in 26 files named "cide.*", where the\r | |
71 | asterisk indicates which letter of the alphabet begins the\r | |
72 | words in each file. For example, file "cide.b" contains words \r | |
73 | beginning with the letter "B". Additional information about the \r | |
74 | tagging conventions and special character symbols are contained in \r | |
75 | ancillary files in this directory more information below). The main \r | |
76 | body of the 1913 dictionary was essentially identical to the edition\r | |
77 | published in 1890, and was republished in 1913 with an appendix \r | |
78 | containing "New Words". The new words of that appendix have been\r | |
79 | integrated into the main file in this version. However, it is important \r | |
80 | to keep in mind that the definitions in this dictionary are in most \r | |
81 | cases over 100 years old. Use them with caution!\r | |
82 | At the bottom of each paragraph in this dictionary, there is a\r | |
83 | bracketed and tagged "source" indicated. This tells from where the\r | |
84 | definition or other text in that paragraph came, as follows:\r | |
85 | \r | |
86 | [<source>1913 Webster</source>]\r | |
87 | = From the original 1890 dictionary.\r | |
88 | [<source>Webster 1913 Suppl.</source>]\r | |
89 | = From the 1913 "New Words" supplement to the Webster.\r | |
90 | [<source>WordNet 1.5</source>]\r | |
91 | = From the WordNet on-line semantic network.\r | |
92 | [<source>Century Dict. 1906.</source>]\r | |
93 | = From the Century Dictionary published in 1906, especially from\r | |
94 | the "proper Names" supplement (volume IX).\r | |
95 | published\r | |
96 | [<source>XXX</source>]\r | |
97 | = Added by one of the volunteers.\r | |
98 | \r | |
99 | The original definitions have been tagged and in some cases \r | |
100 | reformatted or slightly rearranged. If substantive information\r | |
101 | is added from a second source, usually the additional source is\r | |
102 | also noted, as in:\r | |
103 | [<source>Webster 1913 Suppl.</source> + <source>WordNet 1.5</source>]\r | |
104 | \r | |
105 | A list of the ancillary files related to the GCIDE is appended at \r | |
106 | the bottom of this "readme.dic" file.\r | |
107 | This version is tagged with SGML-like tags of the form <pos>...</pos>\r | |
108 | so that the original typography (italics, bold, block quotes) can be\r | |
109 | reproduced. A list of the most important tags for fields in the \r | |
110 | dictionary is given below. The tags also serve the more important \r | |
111 | function of allowing the information content to be conveniently imported\r | |
112 | into computer programs or databases. The set of tags used is described \r | |
113 | in the accompanying file "tagset.web". ***NOTE*** the paragraph tags\r | |
114 | <p>...</p> do *not* always nest properly with certain other tags, such \r | |
115 | as <note> and <cs> ("collocation section"), which in some cases span\r | |
116 | multiple paragraphs. If you are using a tag parser which detects\r | |
117 | improper nesting, you should first either delete the paragraph\r | |
118 | tags or convert them to non-tag symbols, or, if possible, set the \r | |
119 | parser to ignore the <p>...</p> tags.\r | |
120 | The unusual characters (such as Greek or the European accented\r | |
121 | characters, as well as special characters used in the pronunciations)\r | |
122 | are described in the accompanying file "webfont.asc". Some information\r | |
123 | on the pronunciation system used may be found by viewing the files\r | |
124 | "wxxvii.jpg" and "pronunc.jpg" with a GIF viewer (or any web browser),\r | |
125 | and additional explanations of pronunciation are in the file \r | |
126 | "pronunc.web".\r | |
127 | Each paragraph of the original text is enclosed within tags of \r | |
128 | the form <p> . . . </p>. Within these paragraphs are no line\r | |
129 | breaks, and some of the paragraphs are over 12,000 characters long.\r | |
130 | These lines are too long to be handled by the vi editor, and probably\r | |
131 | by some other text editors. At some points, embedded line breaks within \r | |
132 | a "paragraph" are marked by a <br/ "entity". The file can therefore\r | |
133 | be converted, if necessary, to a form with shorter lines, and subsequently\r | |
134 | reconverted back to the form having one line per paragraph.\r | |
135 | \r | |
136 | If additional line breaks are added, then in order remove the \r | |
137 | line breaks and reconstruct the original paragraphs, so that the \r | |
138 | page width can be adjusted, perform the following manipulations:\r | |
139 | (1) convert each line break (cr-lf combination) to a space.\r | |
140 | (2) convert the string "</p> " (</p> followed by two spaces)\r | |
141 | to </p> followed by two line breaks (cr-lf combinations)\r | |
142 | (3) convert the string "<br/ " (<br/ followed by one space)\r | |
143 | to <br/ followed by one line break (cr-lf).\r | |
144 | There will be some "lines" (paragraphs) with over 12,000 characters,\r | |
145 | which may give trouble to some simple text editors.\r | |
146 | A more sophisticated formatting of spaces within paragraphs may\r | |
147 | require the use of the fully-tagged master files. If you have\r | |
148 | a need for these files, contact Patrick Cassidy: cassidy@micra.com.\r | |
149 | The approximate beginning of each page is marked by an SGML\r | |
150 | comment of the form <-- p. 345 -->. (The exact beginning was in some\r | |
151 | cases in the middle of a paragraph, which we decided was not a\r | |
152 | good location for these page-number comments, so the page number\r | |
153 | was usually moved to the next paragraph break). Pages which have \r | |
154 | been proofread by volunteers (e.g., with initials VOL) will have a \r | |
155 | note within that page comment: <-- p. 345 pr=VOL -->. Pages which have \r | |
156 | not been proofread yet (most of them) will have varying numbers of \r | |
157 | typographical errors in them. We still (January 2002) need \r | |
158 | proofreaders to get the errors out of these dictionary files.\r | |
159 | \r | |
160 | ***********************************************************************\r | |
161 | ** WARNING!!! **\r | |
162 | ***********************************************************************\r | |
163 | \r | |
164 | This version is only a first typing, and has numerous typographic\r | |
165 | errors, including errors in the field-marks. In addition, the user must\r | |
166 | keep in mind that this text is very old and will contain numerous \r | |
167 | obsolete, inaccurate, and perhaps offensive statements, which are \r | |
168 | included solely because this work is intended to reproduce accurately\r | |
169 | this historically interesting classic reference work. This text should \r | |
170 | not be relied upon as an accurate source of information, as in many\r | |
171 | cases it represents the state of knowledge around 1890. The text is\r | |
172 | provided "as is", and the user must accept responsibility for all\r | |
173 | consequences of its use. Please refer to the header of each file and\r | |
174 | the GNU public license. If these conditions of use are unacceptable,\r | |
175 | please do not use these texts.\r | |
176 | ************************************************************************\r | |
177 | ************************************************************************\r | |
178 | This electronic dictionary is also made available as a potential\r | |
179 | starting point for development of a modern comprehensive encyclopedic\r | |
180 | dictionary, to be accessible freely on the internet, and developed by the\r | |
181 | efforts of all individuals willing to help build a large and freely\r | |
182 | available knowledge base. A large number of collaborators are needed to\r | |
183 | bring this dictionary to a more accurate, more modern, and more useful\r | |
184 | state. Anyone willing to assist in any way in constructing such a \r | |
185 | knowledge base should contact Patrick Cassidy (see above). All reports \r | |
186 | of errors will be gratefully received, and should also be transmitted to \r | |
187 | PC at: pc@worldsoul.org.\r | |
188 | \r | |
189 | In addition to the main text of the dictionary, additional\r | |
190 | explanatory material about this version of the dictionary is available\r | |
191 | in the ancillary files:\r | |
192 | \r | |
193 | =====================================================================\r | |
194 | COPYING 18,321 11-03-99 1:13a COPYING\r | |
195 | README DIC 13,775 01-17-02 11:48p readme.dic\r | |
196 | WEBFONT ASC 35,234 12-12-01 3:27p WEBFONT.ASC\r | |
197 | TAGSET WEB 55,843 08-16-01 1:16p TAGSET.WEB\r | |
198 | PRONUNC WEB 14,312 06-18-00 3:02p PRONUNC.WEB\r | |
199 | PRONUNC JPG 2,569,796 06-18-00 3:11p PRONUNC.JPG\r | |
200 | SYMBOLS JPG 144,716 06-18-00 3:13p SYMBOLS.JPG\r | |
201 | WXXVII JPG 1,188,380 06-18-00 3:19p WXXVII.JPG\r | |
202 | ==================================================================\r | |
203 | \r | |
204 | \r | |
205 | Most important tags used in the GCIDE:\r | |
206 | <hw> tags the headword\r | |
207 | <pr> pronunciation\r | |
208 | <pos> part of speech\r | |
209 | <ety> etymology\r | |
210 | <ets> "source" word within an <ety> field, usually foreign words\r | |
211 | <fld> field of knowledge (e.g. Med. = medicine)\r | |
212 | <def> definition\r | |
213 | <cs> collocation section (containing word combinations)\r | |
214 | <col> collocation entry (word combination)\r | |
215 | <cd> collocation definition\r | |
216 | <as> illustrations of usage (within a <def>. . . </def> field)\r | |
217 | <au> authority for a definition, or author of a quotation\r | |
218 | <q> illustrative quotation -- in block quote format\r | |
219 | <au> author of an illustrative <q> quotation\r | |
220 | <altname> alternative name for the headword -- essentially a synonym\r | |
221 | <asp> alternative spelling of the headword\r | |
222 | <syn> list of synonyms for the headword\r | |
223 | <p> paragraph\r | |
224 | <b> bold type\r | |
225 | <it> italic type\r | |
226 | \r | |
227 | For other tags, see the file "tagset.web"\r | |
228 | \r | |
229 | \r | |
230 | ============================================================\r | |
231 | OTHER VERSIONS OF THE DICTIONARY\r | |
232 | =============================================================\r | |
233 | \r | |
234 | There are several other derivative versions of this dictionary \r | |
235 | on the internet, in some cases reformatted or provided with an \r | |
236 | interface. Those that I am aware of are:\r | |
237 | \r | |
238 | (1) Project Gutenberg\r | |
239 | ---------------------\r | |
240 | In the extext96 directory of Project Gutenberg (www.prairienet.org)\r | |
241 | there is a version of the original 1913 dictionary, which is in\r | |
242 | the **public domain**. The main files are in the directory etext96,\r | |
243 | and sre labeled pgw050**.***. The tags for that version are a subset\r | |
244 | of those used in this GNU version.\r | |
245 | \r | |
246 | (2) The DICT development group\r | |
247 | ------------------------------\r | |
248 | This group has created a program to index and search this dictionary.\r | |
249 | The program can be downloaded and used locally, but at present\r | |
250 | is available only in a Unix-compatible executable version.\r | |
251 | See their web site at http://www.dict.org.\r | |
252 | \r | |
253 | (3) The University of Chicago ARTFL project\r | |
254 | ---------------------------------------------\r | |
255 | Mark Olsen and Gavin LaRowe at the University of Chicago have \r | |
256 | converted the original 1913 dictionary to HTML and have provided an\r | |
257 | interface allowing search of the headwords. When the supplemented\r | |
258 | version has developed sufficiently to warrant the effort, a \r | |
259 | similar searchable version may be posted there as well. The\r | |
260 | search page is at:\r | |
261 | http://humanities.uchicago.edu/forms_unrest/webster.form.html\r | |
262 | \r | |
263 | That page will provide links to other ARTFL projects and contact\r | |
264 | information for the ARTFL group, who alone can provide information \r | |
265 | about the HTML version or interface.\r | |
266 | \r | |
267 | \r | |
268 | -- PJC\r |