# Markov Model of Natural Language

In this assignment, we analyze an input text for transitions between k-grams (a fixed number of characters) and the following letter.

Then we produce a probabilistic model of the text: given a particular k-gram series of letters, what letters follow at what probabilities?

Then we use the model to generate nonsense text that's surprisingly reasonable.

See the full assignment at the Princeton site: http://www.cs.princeton.edu/courses/archive/fall13/cos126/assignments/markov.html

## Implementation

Begin by implementing the following class:

```class MarkovModel
----------------------------------------------------------------------------------------
// Note: all of the below constructors/methods should be public.

MarkovModel(string text, int k)// create a Markov model of order k from given text
// Assume that text has length at least k.

int order()                        // order k of Markov model

int freq(string kgram)             // number of occurrences of kgram in text
// (throw an exception if kgram is not of length k)

int freq(string kgram, char c)     // number of times that character c follows kgram
// if order=0, return num of times char c appears
// (throw an exception if kgram is not of length k)

char randk(string kgram)            // random character following given kgram
// (Throw an exception if kgram is not of length k.
//  Throw an exception if no such kgram.)

string gen(string kgram, int T)       // generate a string of length T characters
// by simulating a trajectory through the corresponding
// Markov chain.  The first k characters of the newly
// generated string should be the argument kgram.
// Throw an exception if kgram is not of length k.
// Assume that T is at least k.

<<                         // overload the stream insertion operator and display
// the internal state of the Markov Model. Print out
// the order, the alphabet, and the frequencies of
// the k-grams and k+1-grams.
```

## Details

• This is the same as the Princeton spec except that the class method `rand` has been renamed `randk`.
• Here is a header file to get you started: Attach:MarkovModel.hpp.
• Make sure you throw `std::runtime_error`'s in the methods per the spec. Review how to do this in PS5a.
• Test your implementation against the following test file: Attach:mmtest.cpp.
• Consider the behavior of a 0-order model. This model will generate new characters with a distribution proportional to the ratio they appear in the input text. This model is context-free; it does not use an input kgram (representing the current state of the model) when generating a new character. As such:
• The `freq(string kgram)` method takes an empty string as its input kgram (length of kgram must equal the order of the model).
• This method call should produce as a result the length of the original input text (given in the constructor).
• The `freq(string kgram, char c)` also will take a null string as input. It should produce as output the number of times the character `c` appears in the original input text.
• Per discussion in class, consider using a C++ map (http://www.cplusplus.com/reference/map/map/) store frequency counts of each kgram and ``k+1''-gram as it's encountered when you traverse the string in the constructor.
• Make sure to wrap around the end of the string during traversal, as described in the Princeton assignment.
• At construction time, it is highly recommended to build up a string that represents the active alphabet and store it as a private class member variable. This will be useful when implementing the `randk` method, so that you can produce all of the k+1-grams that will follow the provided kgram, test for the frequency of each k+1-gram, and then produce output characters with proportional frequencies.
• See Willie Boag's suggestion in the discussion group for how to generate output characters that follow the probability distribution that's discovered in the input: https://groups.google.com/forum/#!topic/91-204-202-s14/16yuB54d5Fw. You don't have to do it this way; it's just a suggestion.
• Overload stream insertion operator to display internal state of Markov model: See these sample instructions for how to do the overload. You should print out all of the k-gram and k+1-gram frequencies, the order of the model, and its alphabet. Here are sample instructions for how to build an iterator over the map of k-gram and k+1-gram keys.
• Create a Makefile or include short, direct instructions for how to build your project.

## Submit

Submit the following:

• your code files `MarkovModel.cpp` and `MarkovModel.hpp`
• your `main.cpp` that does the text generation task (it may be named `TextGenerator.cpp`)
• the `mmtest.cpp`
• your Makefile or make instructions
• your `ps6-readme.txt` file
`submit fredm 204-ps6 files`