Recent Changes - Search:
ECG Home






Fall 2017

Older Courses

Spring 2017

Fall 2016

Spring 2016

Fall 2015

Spring 2015

Fall 2014

Spring 2014

Fall 2013

Spring 2013

Fall 2012

Spring 2012

Fall 2011

Spring 2011

Fall 2010

Spring 2010

Fall 2009

Spring 2009

Fall 2008

Spring 2008

Fall 2007


edit SideBar


Home Assignments Lecture Blog Resources Discussion Group

Markov Model of Natural Language

In this assignment, we analyze an input text for transitions between k-grams (a fixed number of characters) and the following letter.

Then we produce a probabilistic model of the text: given a particular k-gram series of letters, what letters follow at what probabilities?

Then we use the model to generate nonsense text that's surprisingly reasonable.

See the full assignment at the Princeton site:


Begin by implementing the following class:

class MarkovModel
// Note: all of the below constructors/methods should be public.

       MarkovModel(string text, int k)// create a Markov model of order k from given text
                                      // Assume that text has length at least k.

   int order()                        // order k of Markov model

   int freq(string kgram)             // number of occurrences of kgram in text
                                      // (throw an exception if kgram is not of length k)

   int freq(string kgram, char c)     // number of times that character c follows kgram
                                      // if order=0, return num of times char c appears
                                      // (throw an exception if kgram is not of length k)

  char randk(string kgram)            // random character following given kgram
                                      // (Throw an exception if kgram is not of length k.
                                      //  Throw an exception if no such kgram.)

string gen(string kgram, int T)       // generate a string of length T characters
                                      // by simulating a trajectory through the corresponding
                                      // Markov chain.  The first k characters of the newly
                                      // generated string should be the argument kgram.
                                      // Throw an exception if kgram is not of length k.
                                      // Assume that T is at least k.

           <<                         // overload the stream insertion operator and display
                                      // the internal state of the Markov Model. Print out
                                      // the order, the alphabet, and the frequencies of
                                      // the k-grams and k+1-grams.


  • This is the same as the Princeton spec except that the class method rand has been renamed randk.
  • Here is a header file to get you started: Attach:MarkovModel.hpp.
  • Make sure you throw std::runtime_error's in the methods per the spec. Review how to do this in PS5a.
  • Test your implementation against the following test file: Attach:mmtest.cpp.
  • Consider the behavior of a 0-order model. This model will generate new characters with a distribution proportional to the ratio they appear in the input text. This model is context-free; it does not use an input kgram (representing the current state of the model) when generating a new character. As such:
    • The freq(string kgram) method takes an empty string as its input kgram (length of kgram must equal the order of the model).
    • This method call should produce as a result the length of the original input text (given in the constructor).
    • The freq(string kgram, char c) also will take a null string as input. It should produce as output the number of times the character c appears in the original input text.
  • Per discussion in class, consider using a C++ map ( store frequency counts of each kgram and ``k+1''-gram as it's encountered when you traverse the string in the constructor.
  • Make sure to wrap around the end of the string during traversal, as described in the Princeton assignment.
  • At construction time, it is highly recommended to build up a string that represents the active alphabet and store it as a private class member variable. This will be useful when implementing the randk method, so that you can produce all of the k+1-grams that will follow the provided kgram, test for the frequency of each k+1-gram, and then produce output characters with proportional frequencies.
  • Overload stream insertion operator to display internal state of Markov model: See these sample instructions for how to do the overload. You should print out all of the k-gram and k+1-gram frequencies, the order of the model, and its alphabet. Here are sample instructions for how to build an iterator over the map of k-gram and k+1-gram keys.
  • Create a Makefile or include short, direct instructions for how to build your project.
  • Fill out the Attach:ps6-readme.txt.


Submit the following:

  • your code files MarkovModel.cpp and MarkovModel.hpp
  • a source file TextGenerator.cpp for the Text generation client per the Princeton spec
  • your Makefile which must build an executable named TextGenerator
  • your ps6-readme.txt file

Submit as follows:

Grading Rubric

MM implementation4full & correct implementation=4 pts; nearly complete=3pts; part way=2 pts; started=1 pt
Text Generation implementation2full & correct implementation=2 pts; part way=1 pt; not=0 pt
Makefile2Makefile or explicit build/link instructions included
Exceptions implementation2full & correct implementation=2 pts; part way=1 pt; not=0 pt
Readme2discussion is expected -- at least mentioning something per section.
Edit - History - Print - Recent Changes - Search
Page last modified on May 17, 2015, at 12:04 PM