Linguistic Development Environment
Presentation
--
Texts and Reference
--
Unitex
--
Mailing List
--
Download INTEX
--
Resources
--
Version history
--
The Author
NooJ
The latest version of INTEX, NooJ, has been developped since 2002.
Click
here
for more information on NooJ.
Events
The 9th
INTEX/NooJ Workshop,
The 8th
INTEX/NooJ Workshop,
The
7th INTEX/NooJ Workshop,
The 6th INTEX Workshop,
The 5th INTEX Workshop,
INTEX Session at ACH,
4th INTEX Workshop:
3rd INTEX Workshop:
If you have developed other
INTEX related projects and wish to be listed here, please send an email to max.silberztein@univ-fcomte.fr.
An INTEX site at the Maison des Sciences de l’Homme Ledoux, Université de Franche-Comté: http://intex.univ-fcomte.fr
An INTEX site at NYU, http://www.nyu.edu/pages/linguistics/intex
An INTEX friend site at the
An INTEX friend site at the
Bulgarian Association for Computational Linguistics, Sofia: http://www.bacl.org/intex_for_bulgarian.html
The PROLEX project, from
the Université de Tours: www.li.univ-tours.fr/Fichiers/Fichiers_HTML/Themes/BdTln_Projet_Prolex.htm
Andrew Gordon at the
Analysis of schizophrenic discourses: Reboul A., Sabatier P., Noël-Jorand M-C. Le discours des schizophrènes : une étude de cas. Revue française de Psychiatrie et de Psychologie Médicale. 2001, 49, pp 6-11.
-- The Bulgarian Association for Computational Linguistics has developed a Bulgarian module for INTEX, see http://www.bacl.org/intex_for_bulgarian.html
-- Tita
Kiriacopoulou’s team at the
-- Annibale
Elia’s team at the
-- Elisabete
Ranchhod’s team at the LabEL/CAUTL laboratory,
INTEX is a linguistic development
environment that includes large-coverage dictionaries and grammars, and parses
texts of several million words in real time. INTEX includes tools to create
and maintain large-coverage lexical resources, as well as morphological and
syntactic grammars. Dictionaries and grammars are applied to texts in order to
locate morphological, lexical and syntactic patterns, remove ambiguities, and
tag simple and compound words. INTEX is used by several research centers to
rapidly construct extractors to identify semantic units in large texts, such as
Proper names of persons, locations, technical expressions of finance, etc. INTEX
can build lemmatized concordances and indices of large texts with respect to
all types of
Texts and Reference
REFERENCE:
The
Silberztein, 1993. Dictionnaires
électroniques et analyse automatique de textes : le système INTEX. 240 p., Masson Ed.:
This
book describes the DELA system (and in particular the construction of the
DELAC-DELACF system of dictionaries), the use of the Finite-State technology in the various processes that make up the lexical
analysis of natural languages, and the first implementation of INTEX.
The French manual for v 4.12 (PDF document).
A tutorial for
v4.12 (PDF).
The latest, updated English manual for v 4.33 (PDF).
Silberztein, 1999a. Text Indexing with INTEX, in Computers
and the Humanities #33:3, Kluwer Academic Publishers.
Silberztein, 1999b. INTEX: a Finite State Transducer toolbox, in
Theoretical Computer Science #231:1, Elsevier Science.
The first DELA system of
large-coverage morphological electronic dictionary was described in:
Courtois, Silberztein Eds, 1990. Dictionnaires électroniques du français. Langue française. Larousse:
This is
the special issue dedicated to the French DELA system of electronic
dictionaries ; in particular Blandine Courtois describes the DELAS dictionary, and
Leclère, Christian. 1998. "Travaux récents en Lexique-grammaire". In "Le Lexique-grammaire", Béatrice Lamiroy (ed.), Travaux de Linguistique n° 37, Louvain-la-Neuve : Duculot, pp. 155-186.
This bibliography
includes many references on the linguistic data included in INTEX:
-- Many
researchers have participated to the construction of the French and English
DELAS and DELAC dictionaries, at the LLI laboratory, at the LADL laboratory,
and at the GRELIS laboratory. DELA-type dictionaries are also available for a
dozen languages.
-- The Lexicon-Grammar
series of syntactic dictionaries was designed and built by Maurice Gross and
his team at the LADL laboratory. Lexicon-grammars are also available for a
dozen languages.
PROCEEDINGS
OF INTEX WORKSHOPS:
Dister Ed., 2000. Actes des Troisièmes Journées INTEX. In Informatique et Statistique dans les Sciences Humaines. Université
de Liège, n° 36.
Fairon Ed., 1999. Analyse lexicale et syntaxique: le système INTEX, Actes des Premières et Secondes Journées INTEX. Linguvisticae Investigationes vol. XXII: 1998-1999.
Muller, Royauté, Silberztein Eds, 2004. INTEX
pour
The Unitex software was developed at the Linguistic group (Prof.
Eric Laporte) of the Institut
Gaspard Monge, Université de Marne-La-Vallée,
without the consent, nor even the knowledge of INTEX’s
author.
Unitex,
its interface, its methodology, its standalone programs, its file formats, some
of its linguistic data, as well as its documentation, are copies of INTEX’s.
For months,
no mention of INTEX or its author was present, neither in the Unitex documentation nor in the various WEB sites
associated with Unitex.
See the statement of the Dean of
Arts and Humanities of the Université de
Franche-Comté concerning this sad state of affairs.
See the analysis report of the
“similarities” between Unitex and INTEX.
Please do
not encourage this way of conducting public research.
Latest version is 4.33
This version contains
the French and English that were built at the LADL (Université
Paris 7, CNRS), the Spanish module that was built at the
Distribution Policy
This
software can be freely downloaded and used by individuals (researchers and
students) affiliated with a University, for their individual needs, and
non-commercial purposes only.
Private
and public organizations, laboratories and departments who wish to use INTEX in
Research & Development or Education projects, should contact the LASELDI
laboratory.
None of
the programs and linguistic resources included in the INTEX package should be
copied, redistributed, incorporated into other software, or published without
their author’s consent and proper citation (shouldn’t this be
obvious?).
1. If you
agree with these terms, download the installation file Intex.zip
(zip file, 26 MB).
2. Launch
the SETUP.EXE Installation program to install INTEX on your system (e.g. in
C:\Program files),
3. Launch
INTEX; it will display a Machine Identification Number and ask for an Installation
Key. Two possibilities:
(3a) to get
a personal (DEMO) version of INTEX: you type in the following information:
License number: 1
Contact: PERSO
Institution: DEMO
Installation Key: pxN9pINF8
(3b) to get
a licensed version of INTEX: you need to:
Contact: John Smith
Institution:
Machine ID: 12345
License number: 123
Contact: John Smith
Institution:
Machine ID: 12345
=> Installation key: ab1234cde
4. Launch
INTEX again; enter the above information to register, and enjoy! Check out the
INTEX documentation and reference above.
5. Remember
that you can get extra language modules from their authors; check the links above for additional modules.
The INTEX package contains the
French and English modules, which contain the latest version of the DELAF and
DELACF dictionaries, offered to the INTEX community by Blandine
Courtois, author and co-author of these dictionaries,
while working at the LADL (Université Paris 7-CNRS).
For information about the DELA dictionaries, see the Reference.
Other DELAF and DELACF dictionaries compatible with the versions 4.3x of INTEX
are available, see below.
Before you download anything: Limitations Of Use
Following are specialized linguistic resources and tools offered by
their author to the INTEX community. None of the linguistic resources below can be
redistributed, incorporated into other software, or published without their
author’s consent and proper citation (shouldn’t this be obvious?).
--
Annibale Elia’s team
at the
-- Xavier Blanco at the Autonomous
University of Barcelona has built a new Spanish module for INTEX, using the 4.33 morphological parser, see Spanish_demo.zip
-- Dictionary
of compound nominal determiners
-- Latest version of Prolintex: dictionary
of French proper names for INTEX
-- A text fully tagged with INTEX (the first
tagged text that takes compounds into account !):
Du côté
de chez Swann (French)
-- An example of a nice student project for
INTEX: Various local grammars for French expressions
(French)
--
-- Grammar for
Time and Date
--
-- A tool to help users remove remaining
ambiguities in partially tagged texts:
Local Grammar for the English lemmatization of compound tenses
Here
is a rather large set of local grammars for recognition of auxiliaries, modals,
aspectuals, etc. (more info.)
Dictionary of French compound nominal
determiners
This dictionary includes over 3,000 compound nominal determiners classified in 15 classes (cf. Info.txt file of the archive). Further information are available in Les déterminants nominaux quantifieurs, 1993 (Phd Thesis, LLI-Paris XIII). Ref.:P.-A. Buvet., 1994, "Déterminants : les noms", Linguisticae Investigationes XVIII:1, Amsterdam : John Benjamins B.V.
abondance de
ce,abondance de.NDET+Dnom14
années-lumières de
son,année-lumière de.NDET+Dnom2
billion d',billion de.NDET+Dnom1
billion de
ces,billion de.NDET+Dnom1
pouce métrique
du,pouce métrique de.NDET+Dnom2
[...]
Le Laboratoire d'Informatique de l'Université de Tours
pilote un projet sur le traitement automatique des noms propres, le projet Prolex. Les dictionnaires Prolintex
ont été réalisés dans ce cadre pour mettre à la disposition de
Pour une amélioration de la recherche des noms propres, nous proposons d'utiliser pour le "preprocessing" d'INTEX un graphe plus complet que le graphe standard, lui aussi disponible à cette meme adresse. Ce graphe avait fait l'objet d'une communication aux journées INTEX en 2000. Merci de nous faire parvenir vos commentaires et suggestions= d'amelioration...
Tools
for the statistical analysis
The statistical
module is now a standalone program that can be launched more than once, in
parallel to an INTEX session.
One advantage
is that it is now possible to compare different queries in the same text, one
query in different texts, different queries in
different texts, even in different languages.
ATTENTION : this program is compatible only
with INTEX Versions 4.23e and above.
DiaTag allows users to tag manually the compound and simple words that were
left ambiguous after the INTEX Disambiguation process. DiaTag
uses the snt file of the text, as well as the two
dictionary files DLC and DLF (vocabulary of the text). For each ambiguous
simple or compound word, DiaTag displays all possible
lexical solutions, as well as a concordance sorted on the left or right
context, and allows users to choose the right lexical entry for each utterance
of the word in the text. Results are incrementally saved, so that users can
work in several sessions, and correct previous choices. DiaTag
is particularly well adapted to the tagging of texts up to 2 Mega Bytes.
ATTENTION : this program is
compatible only with INTEX Versions 4.3x
Du côté de chez Swann,
Marcel Proust
This text has
been fully tagged with INTEX:
-- I have first
removed from the DLF and DLC vocabulary files all the lexical entries that
never occur in this text;
-- Then I have
built two sets of local grammars: one set of general disambiguation rules (i.e.
that can be used with other texts), and one set of “good” rules
that work perfectly for this particular text, and would be good enough for most
French texts;
-- I have
removed the remaining ambiguities on compounds (compound vs
sequence of simple words), and on simple words, with DiaTag.
Various local grammars for French expressions
This .zip file contains
various local grammars for French expressions of time, date, length, height,
width and altitude.
Latest version 4.33:
-- A few bugs have been fixed,
related to: synchronization of the concordance and the text, the EDIT DLM
button, morphological ambiguities associated with more than one lexical
constraint, unsolved variable $XL in the output tag of a morphological when
lemma was implicit, Alphabet file with an invalid format, recognition of the
characters “/” and “>” in texts, processing of
accented letters during dictionary format check.
-- The 4.3x morphological module
has been enhanced furthermore; it is now fully compatible with the DELAS-DELAF
module, so that certain phenomena can be equally described either with
inflectional FSTs, or with morphological FSTs. Languages with high inflection, such as Hungarian,
Korean or Russian can now be processed with INTEX without having to construct
ridiculously large DELAFs. Even for English and
Romance languages, the ability to formalize prefixation
and suffixation without introducing redundancies in a DELAF-type dictionary is
more natural. The new module is more integrated with the transformational
analyzer (see its documentation, chap. 12).
-- The disambiguation program
(interg.exe) has been modified so that irrelevant ambiguities between compound
words or frozen expressions and sequences of simple words do not interfere with
local grammars. More precisely, the disambiguation process does no longer
destroy accidentally lexical hypotheses that do not follow an explicit path of
the local grammar.
-- The lexicon-grammar compiler has
been seriously optimized (1,000+ time faster on a Pentium 4, 512 MB RAM, to
compile the C1d table)
-- Towards NOOJ: the
lexicon-grammar compiler’s outputs have been renamed as “.cfg” files (context-free grammar), and can now be
reused by the INTEX syntactic parsers (including the “Locate
Pattern” window). This allows users to describe free syntactic structures
in lexicon-grammar tables, and apply them to parse texts.
Version 4.32:
-- Enhanced transducers are now
associated with the inflectional and derivational modules, so that INTEX can now
perform automatic transformational analysis and generation. For
instance:
From the parsed text: (N0 John)
(V eats) (N1 an apple), the rule: $N0 is $V_K by $N1 produces the
result: an apple is eaten by John
From the parsed text: (N0 cette affaire) (ETRE est)
(ABLE risible), the rule On peut $Able_V de $N0 produces the text: On peut rire de cette
affaire
Morphological operations can be
cascaded, so for instance: émission_V_N0_p = émettre_N0_p = émetteur_p = émetteurs
-- A user-defined concordance
program, that allows users to sort concordances according to any word or token
inside matching sequences
-- Towards NOOJ: the new
Text-FST now represents the result of the tokenizer
& morphological parser
Version 4.31:
-- variables
in enhanced transducers are now named and can be embedded, e.g. “$(NP
… $(DET … $) … $)”
-- a
Finite-State tokenizer capable of analyzing complex
and ambiguous compounds in Germanic languages and tokenizing Asian languages
-- a
morphological parser fully compatible with the DELA system capable of handling
derivational morphology
-- the
inflectional module can process accents independently from letters
-- a much better installation key
and encryption system that offer authors of binary dictionaries (DELAFs or DELACFs) a better
security against reverse engineering
Version 4.30:
-- functionally
identical to 4.24, but the architecture of the system and several file formats
have been modified
-- latest
versions of the English and French DELA dictionaries (thanks to Blandine Courtois)
List of enhancements and most
important bug fixes in version 4.24:
-- A hierarchical view that
displays the organization of a grammar
-- Added the option to
automatically remove lexical entries associated with Xxx codes (e.g. .XERR and
.XIN) from the text FST
-- Added a special delimiter
character to the alphabet that can be used to tokenize Asian languages
(kosawat@univ-mlv.fr)
--
Added @i variables to the FST outputs built with the
lexicon-grammar compiler to lemmatize frozen (and not so frozen) expressions (simona.vietri@tiscalinet.it)
--
Eliminated a bug that occurs when running grammars that contain a subgraph in which the terminal node is unreachable (marchand@tedm.ucl.ac.be)
List of enhancements and most
important bug fixes in version 4.23e:
-- the
text font specified in the Alphabet file is now used when displaying the MFT
text
-- fixed the “1
character-length” bug in the concordance when entering words in quotes
-- added a first version of a
grammar for French determiners (PLEASE HELP ME TO IMPROVE IT!)
-- a new,
right to left sort command for dictionaries and lists of tokens
-- the
statistical module is now an independent tool (several instances can run at the
same time: compare how
different graphs behave in
different texts, even in different languages!)
-- added a new licensing system
that can produce one key for multiple installations
-- a
memory leak during the construction of large text FSTs
has been fixed
-- a I/O
synchronization problem when user scans the text-FST very rapidly has been fixed
List of enhancements and most
important bug fixes in version 4.23d:
-- tokens
are up to 512 characters long; this is enough to tag expressions such as {je vous
prie d’accepter l’expression de mes
sentiments les plus chaleureux,Sincerely.EXP}
-- the
syntactic parser is more robust, and can deal with larger texts and grammars
-- the
syntactic tree has a better look
List of enhancements and most
important bug fixes in version 4.23c:
-- try the brand new debugger
engine & interface! (rebuilt from scratch)
-- a new,
more stable & faster RTF driver for those nasty large concordances
-- right click in graph windows to
get to a contextual menu
-- the
missing NEWLINE/CARRIAGE RETURN bug has been fixed
-- Compounds in dictionaries and
grammars can include digits, ‘,’ and ‘.’ (make sure you protect the ‘,’ and
‘.’ with a backslash ‘\’).
-- color
incompatibilities when the background of a graph is not white have been solved
-- No more confusions
between similar syntactic/semantic codes, e.g. ‘N+NA’ and ‘
-- Syntactic/semantic codes are now
handled properly in disambiguation rules
-- Synchronization problems
Concordance, Text and Text FST could occur when the text includes Carriage
Return / New line sequences
List of enhancements and most
important bug fixes in version 4.23b:
SERIOUS OPTIMIZATION:
-- RECON, RECOR, RECORIND and
RECONIND (Text > Locate pattern),
-- DICOE (Text > Apply Lexical
Resources with some frozen expressions),
-- FST2TXT (Text > Preprocess
text),
-- ETIQG and VERIFG (Text >
Disambiguation)
-- GR2FST (FSGraph
> Tools > Compile)
should now run at least 30
time faster than in 4.22;
Bug fixes & Enhancements:
-- The ‘Locate pattern’ programs
would crash in certain configurations if the Alphabet file contained some
accented uppercase letters
-- Syntactic or semantic features
incorrectly matched against certain complex lexical entries
-- gr2fst
crashed when one of the embedded graphs was recognizing <E> and only
<E>
-- the
'$(' and '$)' were not properly processed in certain circumstances (bug
introduced in 4.23)
-- Applying an FST in Merge mode
did not always produce a correct result when matching sequences were over 2048
byte long
List of enhancements and bug fixes
in version 4.23:
-- programs
that apply grammars are no longer limited to handling only 64 graphs
-- some
strings in double quotes were not retrieved by reconind
& recorind (index search)
-- table2fst
did not add blanks between words in different columns
-- some
unambiguous compounds with apostrophes were blocking dicoc
-- indexer
no longer crashes if it finds tokens of over 512 characters
-- the
graph editor and the inflection program now manage graphs stored in c:\
directory
-- fst2txt
has been optimized; preprocessing runs up to 4 times faster
-- recor and recorind run up
to 3 times faster
-- it is
now possible to tag texts by applying only dictionaries for compounds
-- simple
word tags do not prevent frozen expressions from being recognized anymore
-- a
Windows 95/98/ME bug was preventing INTEX to see its installation on some
large disk ; the new test
avoids the problem.
List of enhancements and bug fixes
in version 4.22:
-- a new debugger
is now fully integrated in the INTEX environment and uses all lexical resources
associated with the currently loaded text.
-- the bug
related to the beginning and end of sentence match in the disambiguation
process has been fixed.
-- the
protection system in the FSGraph editor that would
prevent users from entering labels ending with spaces or “+”
characters has been more a pain than something useful; it has been removed.
-- some
problems related to the use of embedded graphs stored in other directories have
been fixed.
-- matching any string protected by
double quotes is now compatible with the standard INTEX policy on spaces. Thus
for instance, <MOT>”:” matches “table:” as well
as “table :”. At the
same time, <MOT>” :” matches only
“table :” (not “table:” nor
“table :”).
-- One can compile a non
deterministic fst into a C++ transition table.
-- the
verbose mode of the indexer program has been removed.
List of the major bugs that have
been corrected in version 4.21:
-- some
combinations of inflectional codes did not correctly match a disambiguation
rule (visible for some English conjugated verbs),
-- some
short lexical symbols did not match lexical entries that have unambiguous
inflectional codes (bug introduced, january 4, when
correcting the previous bug...),
-- gr2fst
crashed if the initial node of an embedded graph was refering
to a non-existing graph file, or if an embedded graph recognized the empty
string and nothing else,
-- for
some recursive grammars, the fst compiled with grf2fst
would not always produce the same matches as recorind
(which CF parser uses the text index), even for small depths of recursions,
-- genere crashed if the generated output was
longer than 4,096 characters (typically when a fst
output is in a loop).
-- recorind (apply a GRF file, and check 'Use
Text Index') is up to 3 time faster;
-- one can
specify lemmas *and* categories in symbols or tags, e.g. <can.V> or {can.V:C};
-- the
built-in symbol <MIX> matches forms in which an uppercase letter follows
a lowercase letter (e.g. 'McCarthy');
-- the
built-in symbols <U> (Uppercase letter) and <W> (loWercase letter) can be used in the morphological module;
-- Empty deterministic finite state
transducers were represented by one state (initial & terminal), no
transition. The new representation for empty transducers is: two states, one
transition (1)-<E>->(2), (1) is initial, (1)
& (2) are terminal. This representation is consistent with all INTEX
programs (including flexion).
-- an environment variable INTEXVRB ('verbose').
If set to
'YES', programs called in a shell will display intermediary results; if
undefined or set to 'NO', they will not display intermediary results;
-- the
command line for indexer.exe was too long. The new usage is:
indexer.exe {cdls} Text ResultsDirectory
ResultsDirectory is where the five
resulting files idx, ida, frq, fr0 and stt are stored.
ResultsDirectory must exist and the user
must have read/write/execute permissions for it;
-- a concordance with both left and
right context lengths set to 0 now produces the list of all matching sequences.
Old milestones...
The "LADL, Université Paris 7" versions (v1.0 to v3.5,
1993-1997): The first integrated version, running under the NextStep
Operating System (a UNIX-like OS similar to Mac X), was released in 1993.
It was written in Objective C (a mix of Smalltalk and C) and Display
PostScript. It was then adapted to the OpenStep OS
and could run on NeXt boxes
as well as SUN workstations and PCs:
INTEX 1.x :
a graph editor plus a set of UNIX commands to process finite-state automata and
apply them to texts.
INTEX 2.x :
an integrated GUI for text analysis and concordances
INTEX 3.x :
from NextStep to OpenStep
for PCs, NeXT, SUN and HP workstations.
The "GRELIS"
versions (v4.x, since 1997): While at the Université
of Franche-Comté, in the GRELIS laboratory, I first tried to adapt the OpenStep/PostScript code to OS2, then to DOS/Windows95
without any success. I finally decided to rewrite the whole system from scratch
(v.4.x) in C (for the linguistic engine) and in C++ (for the GUI) using the
Windows API and the Borland C++ Builder development environment; I took
advantage of the opportunity to write a new linguistic engine based completely
on finite-state transducers; represent both DELAF and DELAC-type dictionaries
as FSTs; add a morphological module, and integrate
the lexical parser with a new syntactic parser thanks to the Text-MFT
representation.
INTEX 4.0x :
INTEX for DOS/Windows 95; an inflectional module
INTEX 4.1x :
A brand new 32-bit engine: all linguistic data is represented by finite-state
transducers
INTEX 4.2x, 4.3x
: see above
Most
users subscribe to the mailing list info-intex,
hosted by the New York University (NYU) Information Technology Services. info-intex users
post on a regular basis:
ü discussions related to linguistic
representations,
ü discussions related to the programming
interface,
ü various announcements of interest for INTEX
users,
ü description of technical problems and upgrades,
ü examples of fun uses of INTEX, tips about
INTEX, etc.
To contribute to the list by email, send a message to: info-intex@forums.nyu.edu
To access the list via an Internet browser, go to: http://forums.nyu.edu and enter the keyword:
info-intex
To access the list via a News browser, go to: news://forums.nyu.edu/info-intex
For any questions related to your subscription email to: owner-info-intex@forums.nyu.edu
The Author: Max
Silberztein
I constructed the first
package of Finite State tools for Natural Language Processing, as well as the
French DELAC-DELACF dictionaries for compound words, for my PhD research from
1986 to 1989 at the LADL (University of Paris 7-CNRS), under the supervision of
Prof. Maurice Gross. The thesis
was later published as:
A few
“first-time” specifics about my PhD thesis:
-- a morphological parser
programmed by a mere lookup procedure of DELAF-type dictionaries automatically
expanded from DELAS-type dictionaries, as opposed to parsers (often programmed
in PROLOG) that generated word lemmas by splitting complex utterances.
-- the construction of the
DELAC electronic dictionary for compounds, formalized from several lists of
“frozen nouns”, “idioms”, “co-locations”
and “complex terms” listed by several teams of linguists, under the
direction of Maurice Gross (Université Paris 7)
and
-- the
thesis dealt with the full complexity of lexical parsers of natural languages:
recognition, representation and processing of affixes and morphemes, simple
words, compound words and frozen expressions
-- a representation of
texts that formalizes all types of lexical ambiguities: between simple words,
simple and complex words, complex words, etc., as an Acyclic Finite-State
Automaton, and then later, in the Université de
Franche-Comté (4.x) versions, as a Finite-State Transducer.
-- the
first disambiguation module that uses local grammars represented by
Finite-State Transducers to represent left and right contexts of ambiguous
words. This optional module uses an original “intersection“ algorithm to lighten the Finite-State Transducer of
the text. The resulting rules can take compounds
and frozen expressions into account, and are not limited in length, as opposed
to most current disambiguation programs that consider contexts in fixed-size
windows (often two or three tokens).
I wish to express many
thanks to my colleagues and students, as well as to all the INTEX users who
have contributed (and continue to do so) to help enhance INTEX with their
patience, criticisms, creative ideas and ambitious expectations.
Comments, Questions on this WEB site, or on INTEX?