BigGorilla - Data Integration & Preparation

Setup BigGorilla

An Introduction to Data Integration and Data Preparation

Document

This tutorial gives an overview of BigGorilla and in particular, some of the basic concepts in data integration and data preparation.

Example of Matching Movie Datasets

Document

ipynb (Python3)

ipynb (Python2)

Python3

Python2

This is an in-depth tutorial (with code and data about movies) to show how one typically acquires data, extract relevant information, profile and clean, match and merge datasets.

Example of Converting Wikipedia Dump into JSON

Document

ipynb

This is a simple example (with code and data) to show how one can convert wikipedia files from text to JSON format.

Example of Extracting Info from Wikipedia Pages

Python2

This is an example code that shows how titles and first paragraphs of selected wikipedia articles are extracted and stored in a json file.

Example of Matching Schemas with Flexmatcher

Python

This is an example code that shows how different schemas can be matched to a mediated schema using BigGorilla’s FlexMatcher package.

Example of Scraping Restaurant Reviews

Python2

This is an example code that uses the package Scrapy to scrape reviews from multiple pages from a website.

Code and Data

An Introduction to Data Integration and Data Preparation

Example of Matching Movie Datasets

Example of Converting Wikipedia Dump into JSON

Example of Extracting Info from Wikipedia Pages

Example of Matching Schemas with Flexmatcher

Example of Scraping Restaurant Reviews