|
Projects
Fall 2009 Older Courses Spring 2009
Fall 2008
Spring 2008
Fall 2007 HOWTOs |
Patrick Lozzi OverviewCrawLS is a basic web/domain name crawler with the goal of reporting back a domain's total number of unique hyperlinks. As it finds unique URLs, they are displayed in real-time. Screenshot![]() ![]() Concepts Demonstrated
External TechnologyThis application interfaces with multiple Scheme libraries; net/url, xml, scheme/path, and Alex Schinn's html-parser. One hard to ignore technology, which sometimes goes without saying, is that this application requires the use of the Internet to download or visit pages as it finds them. InnovationCrawler designs are a heavily guarded secret in some well known organizations, such as Google and Yahoo. In addition to assisting me with maintenance and statistical purposes about my site, I wanted to attempt to develop something that might explain why these companies consider crawlers to be a crucial ingredient of their business. Technology Used Block Diagram![]() Additional RemarksContrary to popular opinion, I implemented a path descending crawler rather than the widely accepted path ascending versions. This application has the potential for a wide variety of features and enhancements and since it has personal use, I plan to upgrade it. |