|
Projects
Spring 2012 Older Courses Fall 2011 Spring 2011 Fall 2010 Spring 2010 Fall 2009 Spring 2009
Fall 2008
Spring 2008
Fall 2007 HOWTOs |
AIfall11 /
ImpersonatingMyselfOnChatAbe Shultz My project is a chatbot which attempts to impersonate me on instant messenger (IM). It attempts to determine the important words in messages that people send to my IM account and respond appropriately. To do this, it uses a Bayesian model trained on two years of my chat logs. ![]() Concepts Demonstrated
InnovationThe main innovation in the project is the tools for tagging chat conversations. TF-IDF is usually used for longer, more coherent texts, such as books, rather than conversations. Despite this, the TF-IDF tagger does a reasonable job of detecting important words and tagging sentences accordingly. The Markov chain text generator is based heavily on a Perl version of the same algorithm that I wrote 8-10 years ago. It is useful for producing amusing output, such as "Arctic white owl, has less value than cowdung. Its power is the gun!", but it rarely generates anything that could be easily passed off as the product of a sane, sober human writer. The Bayesian classifier is a simple implementation of Bayes' Rule, calculated from the word frequencies of other people's chat messages. There are also two heuristics run before and after the classifier to try to make the chatbot more convincing. First, if someone sends a message that exactly matches one that someone has sent before, the chatbot sends my logged reply to that message. This heuristic is based on the assumption that whatever I said immediately after receiving the incoming message was an appropriate response, which holds in cases such as greetings and other social call-response pairs. It also saves the computational time of calculating the keyword of the message. Second, if calculating the keyword of the message fails, that is, the keyword does not match any word for which there is a response available, the chatbot selects a random response from the messages that the TF-IDF scoring algorithm had no high-scoring word. These messages are typically short utterances that work in many contexts, such as "Hmm." or "Um, yeah.". Unfortunately, "*hug*" and "*nuzzle*" also appear in this list, so the chatbot is randomly affectionate. Prior to the addition of this heuristic, the chatbot had exactly one response if it did not find a good keyword: (11:37:05 AM) Girlfriend: Hi. *curls up in lap, nuzzles* (11:37:11 AM) Me: Puny mortal, I have no good response! (11:37:22 AM) Girlfriend: . . . ? XD Evaluation of ResultsThe project succeeded in creating a chatbot that can frequently recognize the most important word of a incoming IM message. However, the project does not include word sense detection, and so it cannot tell the difference between "apple"; the fruit, "apple"; the record company, and "apple"; the computer manufacturer. Further, because it uses my old chat messages, it tends to send responses that are outdated or inaccurate. Because my IM chatting is mostly with my romantic partners, the chatbot's responses may be inappropriately familiar or affectionate. Additional RemarksThis is my writeup: Attach:ashultz_impersonating_myself_on_chat.pdf This is the code for the chatbot, log processing stuff, and Markov chain text generation: |