Scrapy

Not to be confused with Scrapie.
Scrapy
Developer(s) Scrapinghub, Ltd.
Initial release June 26, 2008 (2008-06-26)
Stable release
1.0.5 / February 3, 2016 (2016-02-03)
Repository github.com/scrapy/scrapy
Development status Active
Written in Python
Operating system Linux/Mac OS X/Windows
Type Web crawler
License BSD License
Website scrapy.org

Scrapy (/ˈskrpi/ SKRAY-pee)[1] is a free and open source web crawling framework, written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general purpose web crawler.[2] It is currently maintained by Scrapinghub Ltd., a web scraping development and services company.

Scrapy project architecture is built around ‘spiders’, which are self-contained crawlers which are given a set of instructions. Following the spirit of other don’t repeat yourself frameworks, such as Django,[3] it makes it easier to build and scale large crawling projects by allowing developers to re-use their code. Scrapy also provides a web crawling shell which can be used by developers to test their assumptions on a site’s behavior.[4]

Some well-known companies and products using Scrapy are: Lyst,[5] CareerBuilder,[6] Parse.ly,[7] Sciences Po Medialab,[8] Data.gov.uk’s World Government Data site.[9]

History

Scrapy was born at London-based web aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.[10] In 2011, Scrapinghub became the new official maintainer.[11][12]

References

This article is issued from Wikipedia - version of the 8/22/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.