Freitag, 02.02.2007
23.01.2007 Computational Lexicography
Introduction:
Today we talked about Lexicoraphy for example about concordances.
Learner's Diary:
Criteria for good lexicography
Quantity:
- completeness of coverage:
- extensional coverage: number of entries
- intensional coverage: number of types of lexical information
Quality:
- correctness of information
- types of lexical information
- consistency of structure:
- macrostructure
- microstructure
- mesostructure
From corpus to lexicon
Corpus:
primary data (audio, video, recording)
secondary data (text, transcription)
Lexicon:
Layer 1: corpus lexicon ( a simple list of words in the corpus)
Layer 2: Lexicon matrix (columns in a table)
Layer 3: Lexicon with selected genearlisation (classify words in a source, order (alphabetically))
Layer 4: Lexicon with generalisation hierachies
Concordance
? A KWIC (KeyWord In Context) concordance
is a special kind of preliminary, corpusbased
dictionary:
? each word in a text corpus is paired with its
contexts of occurence in this corpus
? Note: Google is a special form of KWIC
concordance: www.google.co.uk
you enter a keyword and you'll find them in context
www.googlefight.com helps you out in case you don't know how to write a word.
Simplest KWIC procedure
1. Corpus creation: make a corpus of texts in
electronic format
2. Tokenisation (re-process each text):
1. process punctuation marks
2. break the text into context units (lines/sentences)
3. Keyword list extraction (all words in text)
4. Context collation (for each keyword)
5. Search for KWIC in corpus
6. Store output and format
? for printing, hypertext (CD, web)
QUIZ:
? What is a KWIC concordance?
A concordance gives the context of words. For example "google".
Which are the two main components of lexicon construction based on empirical data?
Corpus and Lexicon!
Which layers of abstraction are involved in corpus acquisition?
> Layer 1: Primary Data (audio / video recording)
> Layer 2: Secondary Data (transcription, annotation, metadata)
Which layers of abstraction are involved in lexicon
construction? Describe them.
> Layer 1: Corpus lexicon (wordlist, concordance,..)
> Layer 2: Lexicon matrix (entries x data categories, no generalisations)
> Layer 3: Lexicon with selected generalisations (procedurally optimised: semasiological, onomasiological)
> Layer 4: Lexicon with generalisation hierarchies (general, type, default inheritance)
Which layer do standard dictionary types typically belong to?
I'd guess Layer 1, because of the wordlist, where the entries are explained in context (concordance).
What are the 6 main steps in KWIC concordance construction?
Corpus Creation
Tokenisation
keywordlist extraction
context collation
keyword search
output formatting
Explain each of these steps.
1. Corpus Creation: make a corpus of texts in electronic format
2. Tokenisation: (re-processes each): process punctuation marks, break the text into context units (lines, sentences)
3. keywordlist extraction: replace each space sequence by a linefeed / newline, sort the list alphabetically, remove duplicate words
4. context collation: pick context unit (left and right contexts, m words at beginnings, n words at ends), add m boundary marks at beginning and n boundary marks at the end, split into units of length m+1+n
5. keyword search: search for words which occur more than once and compare in which contexts they are used
6. output formating: put all the information together
KWIC: Dictionary Making
? The function of a KWIC is
? to make searching for lexical information more
efficient by putting context information about words
in one place
? for making ?Word Sketches? (Adam Kilgarriff)
? grammatical descriptions: parts of speech
? dictionaries: examples of use, collocations, ...
? Project: Make concordances from your text
corpora and use them to collect lexical
information for your Toolbox lexical databases
Evaluation:
This is an evaluation for the whole course. I think its an good idea to have a portfolio to improve your soft skills. But I think it was too much work for the course to put all the lessons in it with an introduction, leaner's diary and so on. To be successfull and to learn something a portfolio should contain all the tasks and quizzes and a glossary-that would be enough because it took me such a long time to put all the things on the web and actually I didn't learn much about the topics we dealt with. So a portfolio makes sense but only with the points in it mentioned above this would be more effective.
16.01.2007 Types of lexical information: semantics
Introduction:
Todays lescture was a repetition for the exam on February the 3rd.
Learner's Diary:
semasiological definition (reader's dictionary): mother = a female parent; it contains genus proximum and differentia specifica = syntagmatic context
onomasiological definition (writer's dictionary): similar and different words, similar items (paradigmatic)
Componential definition
? splits the meaning of a lexical item into components
? e.g. standard dictionary definition by genus proximum
and differentia specifica
Syntagmatic definition
? contextual definition
? definition by text examples
Paradigmatic definition
? word fields (e.g. in a thesaurus, synonym dictionary)
? semantic relations
hyponyms, hyperonyms
co-hyponyms: synonyms, antonyms
"Analysing a Corpus"
A Poodle hybrid is a cross (hybrid) between a Poodle and
some other breed of dog.
Poodle hybrids have become very popular as pets.
They play a big role in the current designer dog trend.
The Poodle?s no shedding coat is the usual impetus behind
such experimentation, where potential pet owners are
looking for a no shedding version of a breed for health or
hygienic reasons.
Some of these crosses have been developed deliberately,
while others have happened accidentally.
Task:
Find a word field in the text above (collection of words like synonyms (set of words with a similar meaning)
poodle (most specific)
dog
pet (less specific)
A poodle is a dog, a dog is a pet , in this example "pet is the nearest kind".
[Poodle = a dog (genus proximum) with thick curly hair (differntia specifica)]
Additionally we talked about hyponyms and hypernyms.
Pet = hypernym
dog = hyponym
cross and hybrid = synonyms
poodle and terrier = co-hyponyms
male and female = antonyms
Tasks:
Semantic decomposition:
? Define ?definition?...
-a form of words which states the meaning of a term
-either the general meaning or the meaning the speaker uses it for
-term to be defined is known as the definiendum (Latin: that which is to be defined)
? Select a small number of words from these texts, and
provide definitions for them.
Poodle = a dog (genus proximum) with thick curly hair (differntia specifica)
? Imagine the meaning of the word ?bread? is composed
of lots of little bits of meaning. List these bits of
meaning.
Semantic relations:
? Select a small number of words from these text, and
find antonyms for them.
Hybrid ? purebred
Popular ? unpopular
Behind ? in front of
Accidentally - intentionally
Semantic fields:
? Sets of related words
Taxonomies:
Taxonomies are used in many contexts: ? traditional lexicography:
cross-references in standard definitions
thesaurus construction
? for the really dedicated:
? Artifical Intelligence and Text Technology:
ISA hierarchies (inheritance hierarchies)
ontologies
-theories of the lexicon:
type hierarchies (e.g. Head-driven Phrase Structure Grammar (HPSG)
default hierarchies (e.g. ILEX lexicon theory; DATR implementations)
taxononmy hierachy: 1. animal 2. dog 3. poodle/terrier/etc
Semantic relations (summary)
taxonomy (generalisation-specialisation relation,
paradigmatic relations)
? hyperonym
? hyponym
synonym:
antonym:
? opposite [hot - cold]
? complementary [red - blue]
? inverse [father - son]
co-hyponym [poodle - terrier]
meronomy (part-whole relation, syntagmatic relations) - "graphic definition" [ostensive]- different kind of hierachy (syntagmatic)
Example: wheel is a part of a car
Source: Introduction to Linguistics
Evaluation:
Most topics are quite clear now because of our repetition.
Homework:
Ginger beer
Fermentation has been used by mankind for thousands of
years for raising bread, fermenting wine and brewing beer.
The products of the fermentation of sugar by baker's yeast
Saccharomyces cerevisiae (a fungus) are ethyl alcohol and
carbon dioxide.
Carbon dioxide causes bread to rise and gives effervescent
drinks their bubbles.
This action of yeast on sugar is used to 'carbonate'
beverages, as in the addition of bubbles to champagne).
Discuss the following using the Ginger Beer text, giving
examples:
Semantic relations
antonyms: sugar ? salt (opposites), wine ? beer (complementary)
hyponym: alcohol ?wine, alcohol ? beer
synonym: baker?s yeast ? Saccharomyces cerevisiae
meronomy: berverages - champagne
Semantic fields
fermentation (brewing, wine, beer, sugar)
beverages (champagne, beer, wine)
chemistry components (carbon dioxide)
definitions
fermentation: componential definition
wine: paradigmatic definition
Glossary: (not in aphabetically order because of the relation between all the definitions)
Synonyms: Traditionally defined as words with the same meaning. If you want to write something and avoiding that the same word is used over and over again for example the word buy then you can look into a Thesaurus. These are dictionaries in which words with the similar meanings are grouped together.
Antonyms: Word pairs that are opposite in meaning. Antonyms are opposites with respect to at least one component of their meaning, but share all other aspects of their meaning. For example the verbs come and go are opposites with the respect to direction but both involve the notion of movement.Synonyms: Traditionally defined as words with the same meaning. If you want to write something and avoiding that the same word is used over and over again for example the word buy then you can look into a Thesaurus. These are dictionaries in which words with the similar meanings are grouped together.
Antonyms: Word pairs that are opposite in meaning. Antonyms are opposites with respect to at least one component of their meaning, but share all other aspects of their meaning. For example the verbs come and go are opposites with the respect to direction but both involve the notion of movement.
Polysemy: We speak of one polysemous word that has a range of different meanings. Polysemy occurs where one lexeme has a range of different but related meanings.
Homophones: When the individual meanings of a sound sequence are unrelated. It occurs when one form has two or more completely distinct meanings. For example as in right and write or bank - the side of a river and bank - a financial institution because each of the two pairs consists of different words with an identical pronounciation.
Homographs: Words that are spelled the same but have different meanings. For example bank as a financial institution and bank the side of river.
Hyponomy: When the meaning of one word is included in the meaning of another. For example, words like peach, orange and mango are hyponoyms of the more expression fruit
19.12.2006 Types of lexical information: grammar
Introduction:
Today the lecture was similar to the lecture of the other course Introduction to Linguistics and so it was a kind of repetition and helped to understand the topics.
Learner's Diary:
Overview
● Types of lexical information: syntax
? Sentence structure - ?syntax?, ?phrasal syntax?
? Syntactic categories
● parts of speech (POS)
● subcategories
● phrasal categories
● The structure of language: constitutive relations:
? structural relations
● syntagmatic relations
● paradigmatic relations
? semiotic relations
● interpretation relations
● realisation relations
● Text structure - ?text syntax?
Mr Bush (proper noun) accepted (main verb ? finite form) Mr Rumsfeld's (proper noun) resignation (common noun) after (preposition) November (proper noun) mid-elections (common noun) in (preposition) which (conjunction)
the (determiner - definite article) Republicans (proper noun)
lost (main verb ? finite form) control (common noun) of (preposition) both (determiner ? quantifier ? dual) the (determiner - definite article) House of Representatives (proper noun) and (co-ordinating- conjunction) the (determiner ? definite article) Senate (proper noun).
Public (adjective -polar) discontent (common noun) over (preposition) the (determiner? definite article) conduct (common noun) of (preposition) the (determiner ? definite article) Iraq war (proper noun; noun compound) was seen (periphrastic verb - passive) as (conjunction) a (determiner ? indefinite article) major (adjective - polar) factor (common noun) in (preposition) the (determiner ?definite article) defeat (common noun).
We talked about syntax and all its categories:
For example:
Noun categories: determiners
- Articles
definite: the
indefinite: a
- Possessives
my, your, his, her, its, our, their
- Demonstratives
proximal: this
distal: that
- Qunatifiers
cardinal numbers: one, two...
existential: some, several, few, many
dual: both
universal. each, every, all,...
Other categories:
- nouns
- adjectives
- pronouns
- adjectives
- adverbs
- verbs
- prepositions
Glue categories:
- conjunctions
- interjections
The structure of language:
The sign hierachy. ranks
Signs are structured in terms of their position in a size hierachy; the positions in the hierachy are sometimes referred to as ranks.
The main ranks (there subdivisions) are:
- dialogue
- monologue/text
- sentence
- word
- morpheme
- phoneme
Signs at each of these ranks have
- structure (internal structure)
- semiotic relations (fuctions and realisations)
The following website shows a picture of a table of ranks we talked about in class.
http://wwwhomes.uni-bielefeld.de/~gibbon/Classes/Classes2006WS/HTMD/htmd08-v01-grammar.pdf
After that we talked about text structure and had a look at the website of CNN.com.
What is structure?
● Language structure is determined by following kinds of
constitutive relation:
? structural relations:
● syntagmatic relations:
? ?glue? ☺
? combinatory relations which create larger signs (and their realisations
and interpretations) from smaller signs (and their realisations and
interpretations)
● paradigmatic relations:
? ?choice? ☺
? classificatory relations of similarity and difference between signs.
? semiotic relations:
● realisation: the visual appearance or acoustic representation of
signs (other senses may also be involved).
● interpretation: the assignment of meaning to a sign.
Strucures and syntagmatic relations:
Syllable structure
syllable
onset rhyme
nucleus coda
The structure of a syllable is shown in the figure above. In English all syllables contain a nucleus that is normally made up of a vowel. This vowel may be followed by a coda that consists of up to four consonants and is said to form the rhyme together with the nucleus. The nucleus may also be preceded by up to three consonants that form the onset of the syllable.
Syllables that have an empty coda are referred to as open syllables, as opposed to the so-called closed syllables, which are "closed" by one or more consonants following the vowel.
Paradigmatic relations:
● Paradigmatic relations: classificatory relations of similarity
and difference between signs.
? similarity and difference of
● internal structure
● external structure
● meaning
● appearance
Glossary: (all words already defined above)
coda
nucleus
onset
rhyme
syllable
Evaluation:
The lecture was easy to follow and well structured and linked to our other course Introduction to linguistics.
12.12.2006 Lexical Database: Toolbox
Introduction:
Today we listened to a presentation held by Mr. Sascha Griffiths about: Toolbox. Toolbox is a software which is used by linguists and lexicographers. The program is developed by the SIL International (Summer Institute of Linguistics) for field work purposes.
For further information have a look at the following website.
Source: (http://www.sil.org/sil/)
Learner's Diary:
SIL International lists its Toolbox under the following headwords:
- concordance
- database
- dictionary
- field notes
- interlinear text analysis
Basic functions:
- Viewing and Searching
- Browsing
- Editing
- Sorting
Evaluation:
It was quite difficult to follow and remember all the points Mr. Griffiths mentioned during his presentation but afterwars Mr. Gibbon gave a revision of the most important points and most of all became clear.
Donnerstag, 28.12.2006
05.12.2006 Types of lexical information: morphology
Introduction:
This lecture was a well structured one because it was quite good to follow. There were many "words" we have already talked about like compound, afffix and so on and because of this I was able to understand the coherences of most of the "words".
Learner's Diary:
Why word information? Who needs it?
- Why?
- New concepts require new words
- Sometimes new words are invented on the spot
- Who?
- Scientists
- Engineers
- Product branding companies
- Poets
- Everybody else
Then we talked about Jabberwocky. Jabberwocky is a poem (of nonsense verse) written by Lewis Carroll, and found as a part of his novel Through the Looking-Glass, and What Alice Found There (1871). It is generally considered to be one of the greatest nonsense poems written in the English language.
Source: (http://en.wikipedia.org/wiki/Jabberwocky)
Morphological structure:
Branches of morphology:
MORPHOLOGY
INFLECTION WORD FORMATION
↓
↓ ↓
DERIVATION COMPOUNDING
Morphology sketch
- Inflection:
- Function (external structure):
- marks the relation of words to their contexts
- no change in the basic meaning of words
- Form (internal structure):
- affix (prefix, suffix, infix), superfix, stem vowel change
- Word formation:
- Function (external structure):
- creation of new words / parts of speech / meanings
- in principle infinite extendability of the lexicon
- Form (internal structure):
- Root/morpheme creation (blending, abbreviation,...)
- Derivation: 1 stem + affix (prefix, suffix, infix), superfix...
- Compounding: 2 stems, perhaps with interfix or inflection-like affix
Internal structure of words:
Smallest word parts: morphemes
- Morphemes are:
- smallest meaningful parts of words
- There are 2 main morpheme types:
- lexical morpheme (content morpheme, root):
- open set:
- girl, boy, car, box, spoon, grass, sky
- grammatical morpheme (structural morpheme):
- closed set
- free:
- prepositions, conjunctions, auxiliary verbs
- bound:
- affixes, suffixes (in word formation and inflection)
Morphemes and allomorphs
- Morphemes are realised in different contexts by
- allomorphs
- i.e. variant pronounciations
- Example:
- Nouns:
- cats, dogs, horses, oxen, men, women, children
- Verbs:
- hits, bids, hisses, buzzes, itches
How are words built?
- Infelction:
- Function (external structure):
- marks the syntagmatic relation of words to their contexts
- syntactic contexts (agreement in person, number, case):
- subject-verb (English)
- subject verb; determiner - adjective - noun, prepostion-nominals (German)
- situational contexts:
- Verbs: temporal relations, spatial relations
- Nominals: quantity and definiteness relations
- Form (internal structure): stem + affix
- prefix
- suffix
- circumfix
- infix
- superfix
How words are built - form + function
- Root / morpheme creation:
- Function (external structure):
- creates new POS and meanings
- Form (internal structure): parts of 2 or more existing stems:
- brunch
- chortle
- galumph
- Derivation:
- Function (external structure):
- creates new POS and meanings from 1 existing stem
- Form (internal structure): 1 stem + affix
- prefix, suffix, circumfix, infix, superfix
- Compounding:
- Function (external structure):
- creates meanings (maybe new POS)
- Form (internal structure): from at least 2 existing stems
- lamp-post
- whisky-soda
- red-head
Internal structure:
English words consists of...
- a stem has a lexical meaning>>e.g. table, chair, cabbage
- an inflection has a grammatical meaning
- relates a word to its syntactic context
- subject-verb agreement (person, case, number)
- relates a word to its semantic context
- tense/time, quantity, speaker-addressee,...
- e.g. cats, dogs, horses, oxen, men, women, children
Inflexions of English words are
- suffixes (or stem vowel changes):
- person
- number
- case
- However, inflexions in other languages may be:
- prefixes (many African languages)
- suffixes (as in English and German)
- circumfix (German)
- superfix (stress languages; tone languages)
Stems of English words are...
- Simple (i.e. roots, lexical morphemes)
- red, table, run, car, knit...
- Complex, i.e. at least one of the following:
- Derivations:
- a stem and a derivation affix, e.g.
- red + ish = reddish, beauty + ful = beautiful
- Compounds:
- a stem plus another stem, e.g.
- armchair, whisky-soda, red-head
- Both (synthetic compounds):
- a derivation plus a stem, e.g.
- bus-driver, steam-roller
A hierachy of words:
Word = Stem + Inflection
(English: nouns, pronouns, verbs)
↓ ↓
Stem/Base Infelction: affix
prefix
↓ suffix
infix
Compound Stem: 2 stems circumfix
superfix
↓ ablaut
Derived Stem: 1 stem + affix
More precisely ...
A word is
a stem + an inflection
An inflection (in English) is
a suffix or an ablaut
A derived stem is
either
● a root (zero derivation)
or
● a derived stem with an affix
Nothing else is a derived stem
A compound stem is
a derived stem or a word +
a derived stem or a word
or
a compound stem +
a compound stem
Nothing else is a compound stem
A stem is either:
a root (lexical morpheme)
or
a derived stem
i.e. stem + affix (derivation)
or
a compound stem
stem + stem (compounding)
and nothing else is a stem.
Homework:
- Define:
- morpheme
A morpheme is the smallest unit of a word that carries meaning.
- lexical morpheme
A lexical morpheme represents a certain meaning (things or circumstances) within a specific context.
- grammatical morpheme
A grammatical morpheme represents grammatical relations within a sentence.
- stem
A stem is the morpheme which is the basis of all words of a word class and which carries the original lexical meaining of this word class. Stems of various word classes may be modified by adding affixes.
- derived stem / compound stem
A derived stem is either a root (zero affix) or a derived stam with an affix.
A compund stem is either a derived stem or a word plus a derived stem or a word, or a compound stem plus a compound stem.
- What is the difference between
- inflection and word formation
An inflection states adding affixes in order to change the stem vowel for putting it into its grammatical context, whereas derivation uses affixes in order to create new lexical words.
- derivation and compounding?
Derivation uses the addition of affixes to create new lexical words, whereas compounding uses two different lexical morphemes in order to create new words.
- Translate Jabberwocky into your favorite other language
Der Zipferlake
von Christian Enzensberger
Verdaustig war's und glasse Wieben
rotterten gorkicht im Gemank;
Gar elump war der Pluckerwank,
Und die gabben Schweisel frieben.
»Hab acht vorm Zipferlak, mein Kind!
Sein Maul ist beiß, sein Griff ist bohr!
Vorm Fliegelflagel sieh dich vor,
Dem mampfen Schnatterrind!«
Er zückt' sein scharfbefifftes Schwert,
Den Feind zu futzen ohne Saum;
Und lehnt' sich an den Dudelbaum,
Und stand da lang in sich gekehrt.
In sich gekeimt, so stand er hier,
Da kam verschnoff der Zipferlak
Mit Flammenlefze angewackt
Und gurgt in seiner Gier!
Mit eins! Mit zwei! und bis aufs Bein!
Die biffe Klinge ritscheropf!
Trennt er vom Hals den toten Kopf,
Und wichernd springt er heim.
»Vom Zipferlak hast uns befreit?
Komm an mein Herz, aromer Sohn!
O blumer Tag! O schlusse Fron!«
So kröpfte er vor Freud.
Verdaustig war's und glasse Wieben
rotterten gorkicht im Gemank;
Gar elump war der Pluckerwank,
Und die gabben Schweisel frieben.
Source: (http://de.wikipedia.org/wiki/Jabberwocky)
Dienstag, 05.12.2006
28.11.2006 Types of lexical information: Pronounciation
Introduction:
This lecture we talked about pronounciation, spelling and sounds. It was a well structured lesson and esier to follow than the other lectures before.
Leaner's Diary:
Surface structure:
- Two levels:
- linguistic description - metalanguage
- units of language - object language
- Surface structure of
- DICTIONARIES:
metalanguage: the typography and layout of a book, hypertext...
- WORDS in dictionaries:
object language: spelling, pronounciation
If you have a look at a dictionary you will see two different kinds of surface structure which are:
-IPA (International Phonetic Alphabet)
Spelling which is the visual surface structure and the object language (as mentioned above)
Model of types of lexical information:
internal: morphology
external: POS, valency, word structure
Rendering structures:
- Pronounciation rules (acoustic modality)
- Spelling (visual modality)
- Sound-spelling rules (inter-modality conversion)
Representation of sounds:
- prosodic hierachy: (Prosody-stress, rhythym, tone, intonation)
- phonemes:
- function: "smallest word-distinguishing segments"
- internal structure: "configuations of distinctive phonetic features"
- external structure (see syllables)
- rendering: "contextual variants", "allophones"
- syllables
- function: "word distinguishing phoneme configurations"
- internal structure: "configurations of sequential features (consonantal, vocalic; voiced, unvoiced....) and simultaneous features (tone, accent)
- external structure (word)
- rendering: a function of the rendering of phonemes
Further definition:
Phonemes:
The speakers of a language know, consciously or unconsciously, which segments of their language distinguish meaning. For example, the words light and bite are distinguished only by their first sound (spelling is not important here). The initial sounds in [laIt] and [baIt] are thus said to contrast. Contrasting units like this are called phonemes and form the basis of phonology. Phonemes are defined as the smallest meaning-distinguishing units in language. Ponemes are put into slashes to visualize that they shall not be seen as letters anymore - e.g. /p/ /t/ /a/
Source: Introduction to English Linguistics, UTB basics
The single phonemes are visibly represented by syllybles, which consists of consonants (C) and vowels (V).(In English syllybles contain a nucleus that is normally made up of a vowel)
For example:
The german word "streifst" has 8 phonemes: CCCVVCCC, but only one syllable
C - [sch]
C - [t]
C - [r]
V - [a]
V - [i]
C - [f]
C - [s]
C - [t]
In addition to that we talkeld about minimal pairs. They differ in only one sound and are different in meaning.
For example:
fun and sun
sun and sum
fish and fit
At last there are Allophones.
We can conclude that different phones that (1) do not distinguish meaning (or are non-contrasitive), are (2) regarded as "the same" sound and are (3) phonatically similar are said to be allophones of the same phoneme.
As already mentioned above it is a convention that phonemic symbols are enclosed by slashes / /, whereas allophones are phonetic realisations and are as such enclosed by square brackets.
For example there are at least two allophones of /l/ in English. The clear [l] and the dark [l].
The clear l is phonatically transcribed as [l] and occurs before a vowel e.g. in the words lift and failure.
The dark l is phonatically transcribed as [l] and occurs before a consonant or before silence e.g. in words like silk and feel. (I marked the dark l bold because I couldn't transcribe it how it really looks like. To make it clear please look on the following website:http://en.wikipedia.org/wiki/Velarized_alveolar_lateral_approximant)
We can conclude that [l] and [l] are allophones of the phoneme /l/ in English.
Description of sounds
- For general pronounciation representation in the lexicon:
- phonemic transcription
- just enough phonetic detail to distinguish words
- For detailed representation of speech pronounciation:
- phonetic transcription based on
- articulary phonetics (about speech production)
Interactive Sagittal Section
IPA Chart IPA fonts
Transcription systems for English
- remember the other dimensions of speech description:
- acoustic phonetics (about speech wave transmission)
- acoustic phonetics (about speech perception)
Task:
- make a list of 5 spelling rules
its, it's
affect, effect
advice, advise
accept, except
lose, loose
- make a list of 5 main spelling problems
- Double internal consonants
For example: recommend, accommodate, and committee
- Internal syllables and letters
These are words, which have short, practically unpronounced internal syallables that are easily omitted or misspelled.or example, athletics, category, disastrous, optimistic, privilege, and desperate.
- Words with endings such -ance and -able
Another source of spelling difficulties is words with similar-sounding
endings: extravagant, occurrence, compatible, irresistible, and
performance.
- Words ending in -sede and -cede
till another group of confusingly spelled words is that group ending in
-sede, -ceed, and -cede: for example, precede, proceed, exceed,
supersede.
English and German: tasks
- Pronounciation:
- List
- the consonants of German which do not occur in English
- the consonants of English which do not occur in German
- the vowels of German which do not occur in English
- the vowels of English which do not occur in German
- Spelling:
- List
- the characters of German which do not occur in English
- the characters of English which do not occur in German
- 5 English graphemes containing more than one character
- 5 German graphemes containing more than one character
Montag, 04.12.2006
21.11. 2006 Lexicon data and their structure
Introduction:
Our lecturer Mr. Gibbon was absent today so the lecture was held by Mr. Trippel. We discussed many technical terms in a dictionary e.g. lemma and lexeme.
Learner's Diary:
Revision:
Microstructure:
- number of lexicon articles/entries/records
- order of DatCats (Data categories)
Mesostructure:
- interrelation of lexicon entries
- relation to external information
Macrostructure:
- order of lexicon entries
- selection of sort key
- sorting order not trivial!
Lexicon Microstructure: DatCats in Lexicons
- words
- grammatical information
- part of speech (POS)
- inflectional class (inflectional term in morphology assigned to affixes which encode grammatical properties such as number, tense and do not change the part of speech of the stems to which they are attached)
Source: (http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/node98.html)
- valence (In linguistics, valency or valence refers to the capacity of a verb to take a specific number and type of arguments (noun phrase positions). A monovalent verb (for example, sleep) cannot take a direct object (*he sleeps it). A trivalent verb has three arguments (e. g., give has the giver, the recipient, and the thing given).
The linguistical meaning of valence is derived from the definition of valency in chemistry.
Valency is closely related, though not identical, to transitivity. Transitivity refers to the number of core arguments of the verb that are not optional (giving intransitive verbs, transitive verbs, and ditransitive verbs).
For example:
(1) Newlyn lies. (valency of lie = 1, intransitive-a verb that has a subject but does not have an object)
(2) John kicks the ball. (valency of kick = 2, transitive-a verb that requires both a subject and one ore more objects)
(3) John gives Mary a flower. (valency of give = 3, ditransitive-a verb which takes a subject and two objects called direct and indirect object)
The concept of valency is however undermined by the fact that non-optional or core meanings are hard to pin down. For example:
(4) Ask, and God will give.
(5) John kicks Mary the ball.
(6) The horse kicks.
Source: (http://en.wikipedia.org/wiki/Valency_(linguistics)
- representaion of meaning
- semantics
- definition
- corpus reference :=usasge examples
Lexeme/Lemma - Lexeme refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme.
Source: (http://en.wikipedia.org/wiki/Lemma_%28linguistics%29)
Problematic issues in lexicography
- ambiguity
- synonyms : two word forms, same meaning
- polysemy: one word form, two (or more) sloghtly different meanings
homonyms: one word form, meaning completely different
- search word
- languauges with inflectional prefixes
- orthographic ambiguity
- picture lexicons?
- language change
- "new" words,
- new meaning
Affixes: Prefixes and Suffixes
There are differnt kinds of affixes, depending on where they are attached to a base.
Prefixes are affixes that are attached to the beginning of a base, such as anti- in the noun antihero (anti-hero), dis- in the verb disarm (dis-arm), or un- in the adjective unfair (un-fair).
Suffixes are attached to the end of a base, such as -ness in sadness (sad-ness), -ing in weeping (weep-ing), or -est in deepest (deep-est).
Most affixes of English are prefixes and suffixes.
Source: Introduction to English Linguistics, UTB basics, A.Francke Verlag
Glossary:
All the words already defined above
Affixes (Prefixes, Suffixes)
Ambiguity
Concordance (in First order lexicon - corpus lexicon)
a concordance is a special kind of dictionary:
- each word in a text corpus is linked with its contexts of occurance in this corpus
- (Google is a special form of concordance)
Source: (http://www.uni-bielefeld.de/lili/personen/vraithel/teaching/ial/lexicography-intro-02.pdf)
Lemma
Lexeme
Valence, Valency
Verbs (intransitive, transitive, ditransitive)
Evaluation:
There are lots of definitions which have to learn by hard but I'm glad that at least I found good definitions for all of them.
Freitag, 01.12.2006
14.11.2006 Lexical database
Introduction:
Today we talked about the surface structure and the deep structure of a dictionary. There are still some problems to define this techincal terms.
Learner's diary:
Deep structure and surface structure are terms which come from the area of generative grammar, which is due to Noam Chomsky.
The surface structure of a dictionray refers to the
- appearance and
- rendering
and the deep structure of a dictionary is the
- underlying organization
Beneath the surface...
- Dictionaries are simpler under the surface than real life...
- Semasiological dictionaries:
- the basic form is a TABLE
- the rows are lexical entries, with a specific microstructure
- the columns are single types of lexical information
- if the orthography or phonology of a lexical item is ambigious, then
- either the item is repeated with the new information
- or a sub-table is created
-but this depends on the kind of ambiguity:
- homonomy (homography, homonophy)
- polysemy
Define the words...
homonomy, homophony: occurs where one form has two ore more completely distinct meanings but with the same pronounciation. In dictionaries homophones are usually represented by seperate entries.
Example: write and right are homophones
lexeme: a lexeme (stem) is the minimal unit of language which
- which has a semantic interpretation and
- embodies a distinct cultural concept
It is made up of one ore more form-meaning composites called lexical units (sense).
Source:(http://www.sil.org/linguistics/glossaryoflinguisticterms/WhatIsALexeme.htm)
polysemy: occurs where one lexeme has a range of different but related meanings (more than one meaning). Take a look at a page of a dictionary and you will most likely find a number of words with more than one defenition.
Example: to buy (with money)
1.) to obtain sth by paying money for it - He bought me a new coat, Where did you buy that dress
2.) to be enough to pay for sth - He gave his children the best education that money can buy, Five ponds doesn't buy much nowadays
Both polysemy and homophony refer to a single form that has two or more meanings. Such words with more than one meaning are called ambigious, and accordingly polysemy and homophony are said to create lexical ambiguity. The sentence She has bought it could mean either she has obtained sth by paying money for it or, metaphorically, that she believes sth she has heard to be true. In this case, we cannot tell which of the two possible meanings of buy the speaker or writer of the sentence intended. The sentence is ambigious. Many puns and jokes are based on ambiguity.
Source: Introduction to English Linguistics, A. Francke Verlag Tübingen und Basel, UTB basics
Then we talked about tables in Open Office/MS-Excel and HTML models and examples were given.
Basic model of a table:
- Table: a list of rows
- Row: a list of fields
- Column: a list of fields in the same row position
love | noun | a feeling of deep affection for s.b or s.th. |
poodle | noun | a dog with a haircut |
green | adjective | the colour of grass |
Glossary:
All the words are defined above.
Homophony
Lexical ambiguity
Lexeme
Polysemy
Table (of words and entries)
Mittwoch, 22.11.2006
07.11.2006 The architecture of a dictionary
Introduction:
Today we talked about the several parts of a dictionary called the Megastructure, Macrostrucure, Mesostructure and Microstructure.
Learner´s Diary:
The Megastructure of a dictionary is the entire structure of the dictionary, including
- the front matter,
- the abbreviations and explanations of grammar
- the body of the dictionary
- the back matter
Quiz:
Give examples of the kinds of information contained in each of these structure types.
- the front matter - the name of the dictionary, the headline
- the abbreviations and explanations of grammar - e.g. "v" for verb, "n" for noun
- the body of a dictionary - legical entries
- the back matter - hints for usage, example of the structure
The Macrostructure of a dictionary is the organisation of the lexical entries in the body of a dictionary into
- lists
- tree structures
- networks
Types of macrostructure: semsiological, onomasiological
Quiz:
Are semasiological macrostructures more like lists, trees, or networks?
Like lists (words in alphabetically order)
Quiz: megastructure, macrostructure
1.) What is the Megastructure/Macrostructure of a lexicon? Give examples.
2.) What is a Semasiological/Onommasiological dictionary? Give examples.
1.) Megastructure: The overall structure which consists of the front matter (metadata), abbreviations and explanations of grammar, the body of the dictionary, the back matter
Macrostructure: Organisation of entries (text corpora) - lists, trees, networks
2.) Semasiological dictionary: reader´s dictionary, decoding d.(words listed in alphabetical order)
Onomasiological dictionary: writer´s dictionary, encoding d.
The Microstructure of a dictionary is the consistent organisation of lexical information within entries in the dictionary.
Quiz:
- 1.) How many types of lexical information can you find?
- 2.) Is the microstructure of a semasiological dictionary typically a list or a tree or a network?
- 3.) What kind of structure do the combined macrostructure and microstructure of a semasiological dictionary have?
- 4.) And in an onomasiological dictionary?
- 1.) history of words, synonyms and antonyms, pictures, translation
- 2.) a list
- 3.) a list with different parts and these parts are lists as well - (embedded list structure (also kind of tree structure))
example: a table
- 4.) refered to question 1-3 but now much simpler: a tree structure
Types of lexical information:
Meaning - Pragmatics, Semantics
Structure - Syntax (Text, Phrase), Morphology (Inflexion, Word formation)
Appearance - Form (Pronounciation, Orthography)
Quiz: Microstructure:
1.) What is the microstructure of a dictionary
2.) What kind of lexical information is contained in a dictionary´s microstructure?
3.) Describe the two dimensions of types of lexical information.
4.) How do you define "definition" ? Give examples
1.) Consistent organisation of lexical information within entries
2.) It contains of properties of linguistic units such as words
3.)
4.) Words in context (see Homework for 31-10-2006)
The Mesostructure of a dictionary is the set of relations between lexical entries and other entries such as other parts of a dictionary or a text corpus.
Quiz:
1.) How do lexical entries relate to each other?
2.) How do lexical entries relate to the mini-grammar in the megastructure?
3.) How do lexical entries relate to text corpora?
Lexicon mesostructure:
- Linguistically motivated class hierachy of DATCAT subvectors e.g. modality, grammar, object semantics
- Linguistic description references, e.g. use of abbreviations for parts of speech, characterisations of spelling
- Cross-references between related entries, e.g.co-hyponyms (synonyms, antonyms)
- Corpus refernces (concordance)
Quiz: Mesostructure:
1.) What is the mesostructure of a dictionary?
2.) Give examples for mesostructural elements concerning
- Types of information with reference to the sign model
- Linguistic description references
- Cross-references between related entries
- Corpus references
Samstag, 28.10.2006
24.10.2006 / 31.10.2006 On defining "definition"
Introduction:
The session today was about dictionaries, definitions and we learned something about genus proximum and differentia specifica.
Learner´s Diary:
The "meaning" of a dictionary is to get information.
- Metadata - catalogue information about the production of the dictionary intended for dictionary identification
- Types of lexical information in dictionary entries: FORM (cf. appearance), e.g. spelling, pronounciation, STRUCTURE (cf. formulation), e.g. construction of words, place of words in larger constructions (e.g. sentences), CONTENT (cf. meaning), definition, relation with other words, examples
In dictionaries you can find good definitions and bad definitions.
Good definitions:
- Standard dictionary definition
- Recursive definition
- Real definition (otensive definitions, models)
Bad definitions:
- circular definitions (sometimes unavoidable)
Standard dictionary definition: X is a Y kind of Z. Defenitio per genus proximum et differentia specifica. Defenition by nearest kind and specific differences.
Example (DCE 1987):
baby: a very young child, especially one who has not yet learned to speak or walk
X is a Y kind of Z
X=baby, Y=very young, Z=kind of child
Example for genus proximum and specifica differentia:
poodle: a dog with thick curling hair
genus proximum: a dog (nearest kind)
specifica differntia: with thick curling hair (specific differences)
In many contexts taxonomies are used. Coming back to the word "poodle" - a taxonomy is a hierachy of genus proximum (also called tree structure).
Example: "poodle" (poodle=dog, dog=animal)
on top of the hierachy/tree structure: animal
then - dog
and at the bottom of the hierachy/tree structure: poodle
Taxonomies are used in traditional lexicography:
- cross-references in standard definitions
- thesauraus constructions
Now we come to the elements of definition. For this we have a look at the definition of the word animal.
animal: a living creature (genus proxima), not a plant (=defenition by negotiation of co-hyponyms) that has senses and is able to move itself when it wants to...
creature: a living being of any kind...
being: a living thing, esp. a person... (=defenition by enumeration of hyponyms)
thing: a material object...
object: a thing that can be seen or felt...
So you come again to the word "thing" (circular definition).
Recursive definition:
A recursive definition, sometimes also called an inductive definition, is one that defines a word in terms of itself, so to speak, albeit in a useful way. Normally this consists of three steps:
- At least one thing is stated to be a member of the set being defined; this is sometimes called a a "base set".
- All things bearing a certain relation to other members of the set are also to count as members of the set. It is this step that makes the definition recursive.
- All other things are excluded from the set
Task - define:
1.) ancestor: a parent or a parent of an ancestor
1. parent (base set)
2. parent of an ancestor (recursive definition)
3. nothing else is an ancestor (excluded set)
Other example (Poem of Gertrud Stein): "A rose is a rose is a rose
2.) natural number:
For instance, we could define natural number as follows (after Peano):
- "0" is a natural number.
- Each natural number has a distinct successor, such that:
- the successor of a natural number is also a natural number, and
- no natural number is succeeded by "0".
- Nothing else is a natural number.
Notice that the second condition in the definition itself refers to natural numbers, and hence involves self-reference. Although this sort of definition involves a form of circularity, it is not vicious, and the definition is quite successful.
(Source: http://en.wikipedia.org/wiki/Definition)
Glossary:
Media - a medium is a presentation of a text. It makes the structure of a text visible or audiable. It combines various aspects, e.g. the layout, size, colour, positioning on page, font...
Pragmatics: - the relation between signs and the users of signs (the action you perform with signs).
The study of the way in which language is used to express or interpret real intentions in particular situations, especially when the actual words used may appear to mean something different (Oxford Advanced Learner´s Dictonary OALD).
Production: production in connection with semanitcs and media means that we produce media starting with semantics. We first start with a content (semantics) and than interpret it in media.
Semantics: the relation between the sign and reality. The branch of linguistics dealing with the meaning of words and sentences (OALD).
Shared world: the text properties in the shared world are connected to the knowledge all people have. For instance, we all know about politics or love, how chairs and tables lool like... It´s the knowledge about the world ( it also includes the knowledge and learning about media).
Worl of the mind: the world of mind is about personal knowledge, things that are going on in a person´s mind and that only the thinking person knows
Glossary:
The several types of definitions - already defined above
Evaluation:
I think it was good to repeat the lecture. Now the topics we dealed with are clear.