Jerron St. Armand
December 10, 2008

Overview

My project's goal is to be able to extract XML data from a baseball statistics website, along with providing easy to use functions and bindings to query the XML file and obtain any data that may or may not be rendered by the HTML. Also the data from the website will be exported via the Scheme code to an RSS feed format document.

Screenshot

Concepts Demonstrated

  • Data abstraction is used to provide access to the elements of the XML File
  • Lists are created by information obtained from the XML file, then recursion and mappings are performed on those lists.

External Technology

I used the Scheme library SXML and Pretty Print . SXML is the main driving force behind my project. SXML is an abstract syntax tree of an XML document, which means SXML is a syntax that a programming language like Scheme can understand. The Scheme library, also named SXML is "a collection of tools for processing markup documents in the form of S-expressions". What I did with the library SXML was dive into the baseball statistic XML files I found on the internet, and pulled values from the game, teams and players.

I also used Pretty Print which is just a simple displayer in the interaction window. It removes quotes and can add indentation to text. I only used it for displaying the menus.

Innovation

What I feel is the most important part of my project is the ability for my program to go into these XML files from the web and to parse and extract the data I want. If you go to the website http://erikberg.com/mlb/ , you can click on any date, and then there will be the box scores from all the baseball games played on that particular day. Then you can click on each individual game to see even more stats. Just copy the URL of that page and paste it into the third line of my program under the "link" definition.

What I discovered after doing a view source of the XML page was that there were a lot more statistics that were not displayed on the website through the HTML. The major statistics like hits, singles, doubles, home runs etc. are all there, but if you wanted to know how many sac flies, hit by pitches or total bases of an individual player you'd have to scroll through the complicated XML file by viewing source. My program has a list of all these statistics and the capablity to extract their values.

A text file in the format of standard v2.0 RSS feed is created each time the program is run with information pertaining to the game. This wasn't very complicated. I first figured out how to create a text file with Scheme, and write to it. Then I learned the syntax of an RSS feed. FinallyI hard coded the the necessary xml tags for the RSS feed into my program along with some of my functions to create a simple RSS feed of the baseball game.

Technology Used Block Diagram

Additional Remarks

Whenever I load a xml file from the that xml baseball statistics website, I get red warnings because SXML library doesn't like the third line. It is smart enough to ignore it, so there are no errors but it does display an ugly red warning in the interactions windows almost 20 times. To get around it you can simply save the xml file to your computer and delete that one line from the file. My program will run much faster as a result of saving the xml locally.