Schedule for CS 4650 and CS 7650

(subject to change)

1/8 Welcome. History of NLP and modern applications. Review of probability and linear algebra. Ground rules. J&M Chapter 1
1/10 Supervised learning. Bag-of-words models. Naive Bayes. Learning to classify Wordnet senses. LXMLS lab guide 0-0.3 (inclusive), 1-1.2 (inclusive). J&M 19-19.3. Optional: word sense disambiguation survey paper HW1 due.
1/15 Discriminative classification. Logistic regression, perceptron, regularization [notes]. Word sense disambiguation [slides]. J&M 20-20.3, 20.7, LxMLS lab guide 1.3-1.4. Bo Pang et al “Thumbs up?…” Optional: Pang and Lee’s survey of sentiment analysis. Tom Mitchell’s chapter on classification. Dan Klein’s tutorial on classification. Project 1 out.
1/17 Clustering and EM. The expectation maximization algorithm. Document clustering. Word sense clustering.
Nigam et al., JM 20.10. Optional: tutorials on EM; Passive-Aggressive Learning;
classic papers on word sense clustering; survey on word sense clustering.
HW2 due.
1/22 Language modelsN-grams, smoothing, speech recognition.
J&M 4-4.9, 4.11.
1/24 Finite state automata. Edit distances, morphology.
J&M 2-3 Project 1 due.
1/29 Finite state transducers. Semirings, finite-state composition, noisy channel model. [notes] J&M 5 HW3 due.
1/31 WFST applications. Translation, spelling correction, word segmentation. POS tags. [notes] [POS slides] J&M 6-6.5 Proj 2 out
2/5 Sequence labeling 1. Hidden Markov Models. Viterbi algorithm. BIO tagging. [notes] [slides] LXMLS lab guide section 3. Sutton and McCallum CRF tutorialoptional: Collins 2002. HW4 due.
2/7 Sequence labeling 2. Discriminative sequence labeling. Structured perceptron and conditional random fields. [notes] [slides] J&M 12-12.7
2/12 Syntax and CFG parsing. Context-free grammars. CKY algorithm, head propagation. [slides] J&M 13-13.4. Optional: PCFG models of Linguistic Tree Representations (Johnson 1998). The Penn Treebank Bracketing Guidelines.
2/14 Lexicalized parsing. State-splitting, CRF parsing. [notes] [slides] J&M 14-14.7. Optional: Intricacies of the Collins’ Parsing Model (Bikel 2003). Accurate unlexicalized parsing (Klein and Manning 2003). Spectral learning of latent variable PCFGs (Cohen et al, 2012). Proj 2 due.
2/19 Dependency parsing. Minimum spanning tree, Eisner algorithm, structured perceptron for parsing. [notes] [slides] McDonald et al, 2007. Optional: Eisner algorithm worksheet. HW5 due.
2/21 Midterm (everything through dep. parsing) Proj 3 out
2/26 Structure induction. Expectation-maximization, sampling. [notes, more notes] A Statistical MT Tutorial Workbook, Sect. 1-14. Optional: EM worksheet. HW6 due.
2/28 Compositional semantics. Lambda calculus, CCG. [slides] J&M 17-17.3, 18-18.3. Optional: Manning, Introduction to Formal Computational Semantics. Zettlemoyer and Collins, Learning to Map Sentences to Logical Form. Intro to CCG. Much more about CCG.
3/5 Shallow semantics. FrameNet, semantic role labeling, relation extraction, integer linear programming.[slides] [notes] J&M 19-19.4. Optional video: Pereira, Low-Pass Semantics. Proj 3 due.
3/7 Distributional semantics. Latent semantic analysis, singular value decomposition. [slides] [notes] JM 20-20.7; optional:  LinPereira, Tishy, and Lee. HW7 due.
3/12 Anaphora resolution. Centering, learning to rank. [slides] [notes] J&M 21
3/14 Coreference resolution. Markov Random Fields. [slides] Raghunathan et al: A multi-pass sieve for coreference resolutionOptional: Singh et al Large-scale cross-document coreference. Wick et al Discriminative hierarchical Proj4 out. HW8 due.
3/26 Discourse structure. Rhetorical structure theory, Penn Discourse Treebank. [slides] Discourse Structure and Language Technology“. Optional: Barzilay and Lapata, Modeling Local Coherence.
3/28 Semi-supervised learning. Bootstrapping, graph-based learning, domain adaptation. [slides] Semi-supervised learningOptional: Much more about semi-supervised learning. HW9 due.
4/4 Information extraction. [slides] J&M 23 Proj 4 due.
4/9 Machine translation 1. Alignment, phrase-based MT. [slides] J&M 25; Optional: Lopez’s Survey. HW10 due.
4/11 Machine translation 2. Syntactic MT.  J&M 25; Optional: Chiang, Introduction to Synchronous Grammars.
4/16 Multilingual learning. McDonald et al. Multisource transfer of delexicalized depenency parsers. Optional: Tackstrom et al, Cross-lingual Word Clusters. Snyder and Barzilay, Climbing the Tower of Babel HW11 due.
4/18 Dialogue. Speech acts, POMDPs J&M 24-24.6
4/23 Project presentations.
4/25 Project presentations.
4/26 Initial project reports due.
5/3 Final project reports due.

Office hours

Professor Eisenstein’s office hours will be 11-12 on wednesday in TSRB 228A.


This course gives an overview of modern statistical techniques for analyzing natural language. The rough organization is to move from shallow bag-of-words models to richer structural representations. At each level, we will discuss the salient linguistic phemonena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.

The assignments, readings, and schedule are subject to change. I will try to give as much advance notice as possible.


This course assumes strong coding ability, and a good background in basic probability and linear algebra. Familiarity with machine learning (in particular: perceptron, naive Bayes, and logistic regression) would be helpful, but is not assumed.


The primary text for this course is: Jurafsky and Martin, Speech and Language Processing, 2nd edition. That’s the purple cover, not white. Other readings will include classic and recent papers, and PDFs from other textbooks and tutorials. Occasionally I’ll link to optional video lectures; these are not a substitute for my lecture and will not cover the same content.

Please have the readings done before class on the date assigned.

Grading and Collaboration Policy

The graded material for the course consists of:

  • 11 short homework assignments. These involve performing linguistic annotation on some text of your choice. Each assignment should take less than one hour. You may skip one. Each is worth 2 points (20 total). This grade includes attendance on the due date, because we will discuss them in class.
  • assigned projects. These involve building and using NLP techniques which are at or near the state-of-the-art. They must be done individually. Each project is worth 10 points (40 total).
  • independent project. This may be done in a group of up to three. It is worth 20 points, including points for the proposal, presentation, and report.
  • 1 in-class midterm exam, worth 20 points. Barring a medical emergency, you must take the exam on the day indicated in the schedule.

Students enrolled in the graduate number 7650 will have an additional, research-oriented component to each homework assignment.

Homeworks and projects are due at the beginning of class; students should also bring a paper copy of homework (not projects) to class on the due date. Late homeworks will not be accepted; projects will be accepted up to three days late, at a penalty of 20% per day. This means that a project turned in at the end of class on the due date can receive a maximum score of 8/10 points towards your final grade.

You may discuss the homework and projects with other students, but your work must be your own — particularly all coding and writing. Using external software resources is acceptable unless the assignment directs you not to, but you must clearly indicate which resources you have used. Using other people’s text or figures without attribution is plagiarism, and is never acceptable.

Suspected cases of academic misconduct will be referred to the Honor Advisory Council. For any questions involving these or any other Academic Honor Code issues, please consult me, my teaching assistants, or

  1. Nishant said:

    Could you please post the Project 1 at the earliest as I am new to python and will take normal than usual to complete the project?

    • nlpjacob said:

      The project will be posted on tuesday. The deadline has also been postponed from the original date. You can spend the weekend learning about python :)

    • Chad said:

      Nevermind. Link was already changed.

  2. lololikya said:

    The project page for project 4 has not been set up yet. But I had a doubt. In the vocab.10k file that came along with the project the last line has no word. Do I have a wrong file? or the last line should be the way it is?

    A related problem with this is that the term document matrix is of the size 9979 * 100000 while the length of the dictionaries are 9980. Is this expected?

    • Sahil said:

      I am facing the same problem. I guess there has to be some problem with the two files.

  3. The project page is up now.

    Just ignore the last line of the vocab.10k file, then it has 9979 entries, corresponding to the number of rows in the term-document matrix.

  4. Arvind Krishnaa J said:

    Sir can you please upload the class lecture slides from 4/4? Thank you!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

%d bloggers like this: