• Scrapy is a framework for extracting data from websites. Scrapy can be used to build a crawler or spider to crawl multiple websites and retrieve selected data.
  • Requests is a HTTP library for Python that provides the necessary apis to scrap websites. Requests can make complex requests to visit a page and get content, such as those requiring additional headers, complex POST data, or authentication credentials.
  • urllib and urllib2 are part of the Python standard library for making simple HTTP requests to visit web pages and get their content.
  • Tweepy is a Python library for accessing the Twitter API to extract tweets.
  • Data Synthesizer can generate a synthetic dataset from a sensitive one for release to public
  • There are other tools such as Apache Nutch and Norconex HTTP Collector (in Java) and more…

*There are more cool tools to add to the list? Tell us about it.