Overview


Go back to the INTEX Page

INTEX is a linguistic development environment based on the technology of Finite State Automata and Transducers (FSTs), enhanced transducers and more generally, Recursive Transition Networks (RTNs). INTEX is the only available integrated user-friendly, platform that allows linguists to describe a natural language from its alphabet, up to the syntactic level, that comes with built-in, large-coverage dictionaries and grammars, and can parse texts of several million words in real time. The following are some INTEX functionalities:

(a) INTEX includes tools to format texts to prepare them for linguistic analyses. For instance, INTEX includes grammars used to recognize sentences with a very high precision (with an accuracy above 99% on French journalistic texts), to tag unambiguous compounds and frozen expressions (so that irrelevant ambiguities are not considered), as well as to solve contractions and elisions (e.g., don't = do not, cannot = can not):

The Finite State Transducer that identifies sentences in English journalistic texts

(b) INTEX includes several handcrafted large-coverage, built-in dictionaries; it allows users to create and maintain their own dictionaries. Users describe the inflectional morphology of a language; INTEX applies this description to DELAS-type dictionaries in order to automatically inflect them; the resulting DELAF-type dictionaries and graphs can then be applied to texts in linear time:

Dictionaries and graphs are applied to texts to identify simple words, compounds and frozen expressions

(c) INTEX applies a set of selected lexical resources (in the form of dictionaries or morphological grammars) to texts. Lexical entries are simple words (sequences of letters, e.g. table), morphemes (affixes of simple words, e.g. -ation), compounds (sequences of simple words, e.g. washing machine) or frozen expressions (contiguous compounds, e.g. to take ... into account):

All the words that have been identified during the consultation of the selected dictionaries and graphs

(d) Looking-up words in dictionaries produces several solutions in the general case; the result of the consultation represents ambiguities between simple words, and simple and compounds. The result of this process, which can be lightened (thanks to disambiguation grammars) is the input of the INTEX syntactic parser:

Text is represented by a Finite State Transducer

(e) Users apply local grammars to remove word ambiguities in texts ; INTEX includes several tools to edit, maintain and debug these local grammars:

Apply local grammars to texts

In "Linear Tag" mode, disambiguated forms are replaced with the corresponding lexical entry

(f) INTEX syntactic parser uses Recursive Transition Networks in order to build the trees that may represent the structure of each sentence of a text. RTNs provide a total control of the structure of each tree, which can be independent from the structure of the grammar:

Derivation tree produced when applying Recursive Transition Networks to a text

(g) INTEX indexes in texts all utterances of a given word (grouping its inflected forms), of a list of words (listed in a dictionary), of a given cat-egory (e.g. all feminine plural adjectives) or, more generally, of any syntactic pattern given in the form of a regular expression or a Finite State Automaton; the resulting index can be used to extract corpora from the text, build concordances, or can be analyzed with INTEX statistical tools. For example, in the following screen shot, the user has indexed the regular expression: (<be> (<ADV>+<E>) going to + will) <V:W> to get all the expressions in Futur (<be> matches any conjugated form of the verb to be, <ADV> matches any simple or compound adverb, <E> stands for the empty string, <V:W> matches any verb in the infinitive):

Index a regular expression

Build complex concordances

(h) INTEX can also apply enhanced transducers (transducers with variables la SED) to texts to perform search & replace, or search & insert operations. Applying enhanced transducers in cascades or in loops allows users to perform powerful operations on texts:

Moving a sequence of adverbs to the right of the past participle

(i) Several tools are included to help edit, maintain and debug grammars which are represented graphically. When possible, sets of graphs can be compiled into minimal deterministic Finite State Automata; they become instantly re-usable in other graphs:

Generate a language represented by a library of Finite-State graphs