A friend of mine wanted to get a price of a specific product from a price-comparison site in a programmatic way, so he asked me how he could that.
Now, this shopping-comparer service doesn’t have an API, so in order to get the price from a page, he needs to write a script that is able to do the following:
1. Request, and get the page for the specific product.
2. Parse the received HTML page, and understand it.
Seams to be a doable task, but the problem is that the only programming he ever did was a bit of C++. So I compiled for him a checklist of things he has to learn in order to be able to implement such a script. Of course many other alternatives exist (Python, Java, PHP etc.), so the following list simply reflects my .Net and C# background.
So here is the checklist:
1. Learn the basics of C# (the programming language in which to implement the script)
http://www.ssw.uni-linz.ac.at/Teaching/Lectures/CSharp/Tutorial/(2 short pdf files)
2. Download and install Visual C# 2008 Express Edition for free (the programming environment in which to wok on the implementation)
3. Download and learn HtmlAgilityPack – library that lets you easily parse HTML pages into nice C# objects and collections (.Net libraries don’t have this feature built-in)
4. Use “DOM Inspector” tool, which comes built-in in Firefox. It parses HTML pages, and shows the structure of page as a tree of tags. It will be very useful in the implementation of the script.
Just download Firefox, it comes with it.
Now, all that is left for my friend is to find lots of free time, and a big pot of coffee.
It has got many features, haven’t got time to check them all out yet.
Firebug for Firefox 2.0