Estimate a project
Estimate Project

Scrape anything, anywhere

by | March 15th, 2016
keep_calm_sign_blue

Problem

Remember, how many times while studying/working did you need to get some information from the website?
It is there, most times open for public use, but impossible to extract. You’ve tried everything else, and you haven’t managed to get your hands on the data you want. Or you’ve found the data on the web, but there is no download options and copy-paste operation has also failed you.

But there’s still a way to get the data out. Here web-scraping comes to help.

Idea: web-scrapping

What is “web-scrapping”? It is basically a technique for getting data from a website.

Imagine, you are a graduate student writing huge thesis on macroeconomic mechanisms of international trade (that’s just an example) and you need to insert some specific statistics. So you go to the OECD website, find an amazing table you need, but you cannot copy anything unless it makes a mess. So you have to spend half a night over rewriting by hand all the necessary data. However, if you knew how to web scrape, it would only take you few mins.

web-scraping2
Source: http://prowebscraping.com/

You can also use web-scraping for various purposes (there are even job offers for freelance digital scrappers). The advantage of scraping is that you can do it virtually with any web site – from weather forecasts to government spending, even if web site doesn’t have an API for raw data access. Moreover, web scraping is not only a tool for data extraction, but is also a good way to study the coding language.

To Do

Well, if you have already had some fun while scraping essential stuff and have basic idea how to get structured data from the web, we would like to introduce you amazing service morph.io. It’s considered to be “A Heroku for Scrapers” and gives wide opportunities for creation, management and hosting of your own scrappers.

Chris Mytton in his blog explained how to actually run these scrappers regularly so you can get information that’s constantly up-to-date.

Keep Improving With Gera-IT!RB_4

Tags: ,