|
[ http://web.cs.dal.ca/~vlado/csci6509/coursecalendar.html ]
Winter 2012 (Jan4-Apr5) Faculty of Computer Science Dalhousie University |
| # | Date | Title | |
|---|---|---|---|
| 1 | We Jan 4 | Course Introduction
Course information: logistics and administrivia, textbook and other main references, evaluation scheme, academic integrity policy, tentative course schedule; Introduction to NLP: what is a natural language and other kinds of languages; Handouts: course home page, tentative schedule, academic integrity policy, A0, coursenotes. Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] Ch.1 | A0 out |
| 2 | Fr Jan 6 | Introduction to NLP
NLP Applications, NLP as a Research Area, short history of NLP, levels of NLP, some reasons why NLP is hard, ambiguities at different levels of NLP, examples of lexical and syntactic ambiguities. Files: Slides (PDF), Lecture notes (PDF). | |
| Part I: Background | |||
| 3 | Mon Jan 9 | Words and Morphology
Project handout, project deliverables; ambiguities at all levels of NLP, examples of ambiguities; Part I: Linguistic background, Words and morphology, Elements of morphology: morphemes, stems, affixes, stemming; morphological processes: inflection, derivation, compounding, clitics; Parts-of-speech (POS), POS tagging, open and closed categories, POS tag sets; Closed word categories, determiners (DT), interrogative determiners (WDT), predeterminers (PDT), personal pronouns (PRP), possessive pronouns (PRP$), wh-pronouns (WP) and wh-possessive (WP$), prepositions (IN), particles (RP), possessive ending (POS), modal verbs (MD). Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] Sec 3.1; 5.1-5.3 | |
| 4 | Wed Jan 11 | Open Word Categories
Infinitive word 'to' (TO), qualifiers (RB), wh-adverbs (WRB), conjunctions (CC), interjections (UH); Open word categories: nouns (NN, NNS, NNP, NNPS), adverbial nouns, adjectives (JJ, JJR, JJS), numbers (CD), verbs (VB, VBP, VBZ, VBG, VBD, VBN); adverbs (RB, RBR, RBS); Remaining POS classes: existential there (EX), foreign words (FW), list items (LS), punctuation; POS tagging examples; Syntax: phrase structure, phrases, clauses, sentences; parsing, parse tree examples. Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] Ch 12 | |
| L1 | Wed Jan 11 | Perl Tutorial 1 Files: Slides (PDF). | |
| 5 | Fri Jan 13 | Syntax
Aside: Festival speech generation demo: http://festvox.org/voicedemos.html, Context-Free Grammars (CFG) review, examples. induced grammar, parse trees, derivations, language generated by a CFG, left-most and right-most derivations, ambiguous sentences, bracketed representation of parse trees; Typical phrase structure rules in English: Sentence (S), Noun Phrase (NP), Verb Phrase (VP), Prepositional Phrase (PP), Adjective Phrase (ADJP), Adverbial Phrase (ADVP). Are NLs context-free? Files: Slides (PDF), Lecture notes (PDF). | A0 due |
| 6 | Mon Jan 16 | Semantics and Pragmatics
Aside: WordNet http://wordnet.princeton.edu, Wiktionary http://en.wiktionary.org. Natural Language Phenomena: agreement, movement, subcategorization; Heads and dependency; head-feature principle, dependency trees, arguments and adjuncts; Elements of semantics: semantic analysis. Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] Ch 17-17.2 (Representation of meaning). | A1 out |
| Part II: Statistical Approach to NLP | |||
| 7 | Wed Jan 18 | Introduction to Stochastic NLP
Aside: NL Principles in Perl. Lexical semantics: word senses and basic concepts; metonymy, WordNet resource, semantic relations between words, semantic compositionality, semantic roles; Part II: Statistical approach to NLP, logical and plausible reasoning, two paradigms of NLP: logical and plausible, counting words and n-grams, Zipf's law, Elements of information retrieval, basic task definition of ad-hoc retrieval, typical IR system architecture. Files: Slides (PDF), Lecture notes (PDF). Reading: 18.6 (Idioms and Compositionality), 19-19.3 (Lexical Semantics and WordNet) | |
| L2 | Wed Jan 11 | Perl Tutorial 2 Files: Slides (PDF). | |
| 8 | Fri Jan 20 | Vector Space Model
Aside: Lucene. Steps in document and query processing in IR, vector space model, tfidf term weighting formula, cosine similarity measure, term-by-document matrix, reducing the number of dimensions, Latent Semantic Analysis, IR evaluation: precision, recall, and other measures. Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] 23.1 (Information Retrieval), [MS] Ch.15 (Topics in Information Retrieval). | A2 out |
| 9 | Mon Jan 23 | Text Mining
Aside: Introduction to IR on-line book http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html. Text mining: text mining tasks, Text classification, problem definition, types of text classification, evaluation measures in text classification, micro- and macro-averaging, Evaluation methods for classification: general issues - overfitting and underfitting, methods: 1. training error, 2. train and test, 3. n-fold cross-validation; Parser evaluation: PARSEVAL measures. Files: Slides (PDF), Lecture notes (PDF). Reading: [MS] Ch. 16 (Text Categorization) | A1 due |
| 10 | Wed Jan 25 | Text Clustering and CNG Classification
Aside: SpamAssassin http://spamassassin.apache.org/. Parser evaluation: PARSEVAL measures, labeled and unlabeled precision and recall, F-measure; Text clustering: task definition, the simple k-means method, hierarchical clustering, divisive and agglomerative clustering; evaluation of clustering: inter-cluster similarity, cluster purity, use of entropy or information gain; CNG -- Common N-Grams classification method. Elements of probability theory. Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] 14.7 (page 479, Evaluating parsers) | |
| L3 | Wed Jan 25 | Perl Tutorial 3 Files: Slides (PDF). | |
| 11 | Fri Jan 27 | Probabilistic Modeling
Aside: Roulette Wheel applet by David Little. Handout: cng-paper.pdf. Generative models, Bayesian inference, Probabilistic modeling: random variables, random configurations, computational tasks in probabilistic modeling, spam detection example, joint distribution model; fully independent model: example, computational tasks. Files: Slides (PDF), Lecture notes (PDF). | |
| 12 | Mon Jan 30 | Naive Bayes Model
Sum-product formula; pros and cons of fully independent model; Naive Bayes model: basic idea, assumption, computational tasks, graphical representation, example, number of paramenters, pros and cons. Files: Slides (PDF), Lecture notes (PDF). | A2 due |
| 13 | Wed Feb 1 | N-gram Model
N-gram model, N-gram model, language modeling in speech recognition, n-gram model assumption, graphical representation, use of log probabilities; Markov chain: stochastic process, Markov process, Markov chain; Perplexity and evaluation of N-gram models, Text classification using language models; Add-one (Laplace) smoothing. Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] Ch. 4 | |
| Fri Feb 3 | Munro Day, University closed, no class | ||
| 14 | Mon Feb 6 | Hidden Markov Model
Witten-Bell smoothing (Witten-Bell discounting); Hidden Markov Model (HMM): DFA-based representation, definition, graphical representation, HMM assumption, POS tagging example, learning parameters. Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] Sec. 5.5 (HMM POS Tagging) | A3 out |
| 15 | Wed Feb 8 | Viterbi Algorithm
HMM Tagging example: brute-force approach, Viterbi algorithm. Bayesian Networks: graphical representation, Bayesian networks assumption, conditional probability tables. Files: Slides (PDF), Lecture notes (PDF). | P0 due |
| L4 | Wed Feb 8 | Guest Lecture
Yannick Marchand: From Speech Synthesis by Analogy to Computational Simulations of human reading. | |
| 16 | Fri Feb 10 | Inference in Bayesian Networks
Bayesian Networks defininition (review); Bayesian Networks: number of parameters, representational power, computational tasks: evaluation and simulation; Inference in Bayesian Networks, NP-hard if max. number of parents is 2. Files: Slides (PDF), Lecture notes (PDF). | |
| 17 | Mon Feb 13 | Sum-Product Algorithms
BN inference example using brute force, Sum-product algorithm: main steps, factor graph and message passing, marginalization in one variable, "hard-wiring" variables, marginalization with several variables, conditioning and competion; burglar-earthquake example. Files: Slides (PDF), Lecture notes (PDF). | |
| 18 | Wed Feb 15 | Project Discussion (P0 discussion), Sum-Product Algorithm Example 2
Individual discussion with students about P0 submissions. Sum-Product completion HMM example. Files: Slides (PDF), Lecture notes (PDF). | |
| L5 | Wed Feb 15 | Discussion about Assignment 2 | |
| 19 | Fri Feb 17 | Probabilistic Context-Free Grammars
HMM as a Bayesian Network, example; Probabilistic Contest-Free Grammar (PCFG): motivation, PCFG as a probabilistic model, computational tasks for PCFG model: evaluation, learning, simulation, proper PCFG. Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] Chapters 13 and 14. | A3 due |
| Mon Feb 20 | Study break Mon-Fri, Feb 20-24 | ||
| 20 | Mon Feb 27 | CYK Parsing
Discussion about P1. Chomsky normal form, CYK algorithm by example. Files: Slides (PDF), Lecture notes (PDF). | |
| 21 | Wed Feb 29 | Probabilistic CYK Parsing
CFG parsing using CYK, PCFG Marginalization using CYK-style algorithm, example; PCFG conditioning; PCFG completion using CYK-style algorithm, example; Topics related to PCFGs: PCFG as a BN; Issues with PCFGs: structural dependencies, lexical dependencies; Probabilistic lexicalized CFGs. Files: Slides (PDF), Lecture notes (PDF). | P1 due |
| Part III: Unification-based approach to NLP | |||
| 22 | Fri Mar 2 | First-order Predicate Logic and Resolution
Unification-based approach to NLP: bits of history; First-order predicate logic: constants, variables, functions, terms, predicates, formulae, sentences, axioms, theorems, inference rules; examples, Resolution-based inference system by Robinson. Files: Slides (PDF), Lecture notes (PDF). | A4 out |
| 23 | Mon Mar 5 | Unification
Resolution inference example, substitution, unifiers and unifiability, composition of substitutions, most general unifier, Robinson's unification algorithm Files: Slides (PDF), Lecture notes (PDF). | |
| 24 | Wed Mar 7 | Efficient Unification, Prolog
Exponential running time of the Robinson's algorithm, Unification using graph representatino, Huet's unification algorithm, example. Prolog overview: Horn clauses, rules and facts, other Prolog elements. Files: Slides (PDF), Lecture notes (PDF). | |
| L6 | Wed Mar 7 | Prolog Tutorial 1 Files: Prolog - Quick overview (PDF), Prolog slides (PDF). | |
| 25 | Fri Mar 9 | DCG, Feature Structures
Using Prolog to parse natural language, Definite Clause Grammars (DCG), DCG example with parse tree, handling agreement, embedded code, expressing PCFG; Unification-based grammars using feature structures, feature structures or attribute-value matrices, DCG expressed using AVMs, re-entrancy and cyclic AVMs. Files: Slides (PDF), Lecture notes (PDF). Reading: [JM] chapter 15 | |
| Mon Mar 12 | No class. Make-up class to be scheduled in a Wednesday lab time | ||
| 26 | Wed Mar 14 | Feature Structure Unification and Chart Parsing
Lists in AVMs, PATR-II notation style; Feature structure unification: graph representation, example, Huet's unification algorithm; Parsing unification-based grammars: basic notions in chart parsing, top-down chart parsing (Earley's algorithm) Files: Slides (PDF), Lecture notes (PDF). | A4 due |
| L7 | Wed Mar 14 | Prolog Tutorial 2 | |
| 27 | Fri Mar 16 | Unification-based Grammar Example
Bottom-up chart parsing; Example of a unification-based grammar. Files: Slides (PDF), Lecture notes (PDF). | |
| 28 | Mon Mar 19 | Course Evaluation, Type Hierarchies.
Type hierarchies: trivial hierarchy, tree hierarchy, bounded complete partial order, type unification in BCPO; Implementing type hierarchies: unpacked-bit representation, packed bit representation. Files: Slides (PDF), Lecture notes (PDF). | |
| Part IV: Course review | |||
| 29 | Wed Mar 21 | Course Review
Review of lectures 1 to 21. Files: Slides (PDF), Lecture notes (PDF). | A5 out |
| Part V: Student Presentations | |||
| 30 | Fri Mar 23 | Student presentations (19-jewkes), Course Review 2 Files: Slides (PDF), Lecture notes (PDF). | |
| 31 | Mon Mar 26 | Student presentations (13-shali,14-mosquera,15-jyoti) | |
| 32 | Wed Mar 28 | Student presentations (10-jankowsk,11-atwater,12-story) | |
| 33 | Wed Mar 28 | (1:30pm) Student presentations (16-vsingh,17-atkinson,18-rouleau) | |
| 34 | Fri Mar 30 | Student presentations (7-patra,8-raza,9-zai) | A5 due |
| 35 | Mon Apr 2 | Student presentations (4-sajadi,5-niri,6-mukanova) | |
| 36 | Wed Apr 4 | Student presentations (1-mehta,2-ashah,3-soni) | |
| Thu Apr 5 | Project Reports due | Report due | |
| Th Apr 12 | Final Exam (15:30-17:30, Dalplex) (Exam schedule) | Exam | |