                  Draft Documentation on the layout of 
                        Aspell dicts packages

The overall goal of Aspell dicts is to provide a uniform method to
distribute dictionaries for Aspell for any language that Aspell
supports.

This documentation is still in an early stage and rather incomplete.
It is meant to give you enough of an overview so you know what is
going on, but probably won't be enough information for you to actually
create a distribution.

Layout of the Distribution:

An Aspell Word List Package contains several type of files, many of
them generated by the proc script.  These must be provided:

info: the main file which contains all of the important word lists
*.cwl: compressed word list files
Copyright: the copyright notice

Several optional ones:

??_phonet.dat: The optional phonet data file 

README: A readme file.  If one is provided the line "readme-file
  README" must be specifed in the info file (see below).  If one is not
  provided a generic one will be created
COPYING: The actual license agreement.  Automatically provided for some
  licenses
doc/* additional documentation

and finally some automatically generated or provided ones:

configure: the configure script which finds the appropriate paths
  and generates the actual makefile.  This file needs to be
  copied from aspell-gen package.
??.dat: the data file for the language.
*.multi: the dictionary files
Makefile.pre: the makefile which configure uses.

*** Format of the Info File

(Note: For a better idea of how this file is laid out see some of the
sample info files included)

The info file is the main file which contains most of the information.
It has two types of entries.  Single value settings, and group
settings.  Single value settings have the form:
  <key> <value>
And group settings which have the form:
  <group key>:
    <key> <value>
    <key> <value>
    ...
If there is ANY whitespace before a key it is assumed to belong to a
group entry.

The following Single value settings are mandatory:

name_english: The english name of the language
lang: The two letter Code
copyright: The copyright one of:
  LGPL
  GPL
  FDL
  Artistic
  Copyrighted (Copyright message must remain)
  Open Source (Meets OSI definition)
  Public Domain (ie none)
  Other
  Unknown
version: A version string
charset: charset to use 
soundslike: one of 
  none
  generic
  phonet
If it is phonet the file <code>_phonet.dat is expected to be present

In addition there must be at least one of each of the following group
entries:

author:
  name: The name of the author
  email: The email address of the author.  The email needs to 
    be translated into an anti-spam versions.  '.' are replaced with
    spaces and '@' is replaced with ' at '.   For example 
    "kevina@gnu.org" becomes "kevina at gnu org".
  maintainer: Set to 'true' if this person activally maintains the 
    Aspell version of the word list.  Set to 'false' or leave out
    otherwise.

Multiple author groups may be specified.

dict:  The defining entry for a dictionary
  name: The name of this dict
  alias: An alternate name (may be repeated)
  add: A word list to add (may be repeated)

multiple dictionaries may be defined.  If a particuler dictionary
should not have a awli entry acceated with it add "awli false".

Dictionary name should be of the form
  <code>[_<country>][-<jargon>][-<size>]

Where <country> is the two letter ISO 3166 country code which should
be in all upper case, <jargon> is any extra information to distingish
the dictionry from other dictionaries, <size> is the dictionary size
and should be a two digit number which should roughly follow these
guide lines:

10: tiny
20: really small
30: small
40: med-small
50: med
60: med-large (the default size)
70: large
80: huge
90: insane

See SCOWL (http://wordlist.sourceforge.net) for an example of how
these sizes are used.

Aliases for individual dictionaries can automatically be created if a
global alias line is defined.  Each global alias represents a part of
a dictionary name.  For example:
  alias fr francais french
  alias 40 sml small
will cause the following alias to automatically be generated:
  francais-40
  francais-sml
  francais-small
  french-40
  french-sml
  french-small
  fr-sml
  fr-small

Aliases normally do not have awli entries assocated with them.  If you
wish a particuler alias to have a awli entry simply tag ":awli" after
the alias.  For example

  alias en_GB en:awli

If an alieas has a awli entry assicated with it the final alias must
be of the proper form

In additional to the above the info file can also contain the following
optional entries

url: Url of the offical version of the dictionary for Aspell
source_url: Url of the orignal word list
source_version: Version of the orignal word list used
name_ascii: The language name in spelled in its own language in all
ascii characters
name_native: Like above but not limited to ASCII characters.

And a bunch of other entries which I will document latter.

*** The *.cwl

For each add entry in the dict entry there should in general be one
word list. Each of these words lists will be compiled into a separate
hash files so you should keep the number to a minimum.  Each file is
expected to have the following format:
  <code>[-...].cwl
These files are expected to be compressed with word-list-compress.  To
compress a file so something like the following
  export LC_COLLATE=C
  cat <word list> | sort -u | word-list-compress c > <code>...cwl
the LC_COLLATE=C is important or other wise the file will not be compressed
optionally.

*** Copyright file

The copyright file simply states the terms in which this word list is
available.  If the license is a standard one or is more than a
paragraph or so the actual license should be included in a separate
file "COPYING".  If you are using one of the GNU licenses the COPYING
file will automatically be generated for you.

*** running proc

Once the info and *.cwl files are created you are ready to run the
proc script.  The proc script needs to be copied or linked into the
current directory for things to work correctly.  Once that is done.
Simply type:
  perl proc create
and if there are no errors you should have the above listed generated 
files.

To try building a word list run configure with
  ./configure

and then to build and install it
  make
  make install

To create a distribution do a 
  make dist
