Use this component when you wish to convert data from one format into another, such as from CSV to JSON, or reshape or reorganize data into a desired form.
Data conversion refers to the transformation of data from one format to another, such as from CSV to JSON, Excel to CSV, PDF to JSON, etc. There are numerous packages for converting data and we list only some here.
- There are a number of tools for parsing PDF files. Here are some of them in no particular order:
- Tabula is a tool for extracting tables from PDF files into CSV.
- PDFminer, Slate , PDFtables are Python packages for extracting information from PDF files into text.
- PDFminer includes a tool that can convert PDF files into HTML in addition to text. Slate is package that depends on PDFminer and presents each page in a PDF file as separate text. PDFtables is a tool for parsing PDF files and extracts tables from these files.
- xlrd is a Python package that parses Excel data. It has accompanying packages for writing and formatting information in Excel format.
- To read in and do lightweight manipulation of CSV and JSON data, use the CSV and, respectively, JSON packages in Python.
- To read in and manipulate relational data, use pandas.
More generally, data (within the same data format) can also be reshaped or reorganized. For example, keeping only the name and skill set of candidates and omitting all other data in a CSV file, or selecting only candidates with data integration skill set from a JSON file and combining that information with information about the salary and rank from another file.
There is an extensive suit of tools for transforming data. The Python package itself contains numerous methods and packages for this. For manipulating and transforming tabular data, the pandas package is quite popular.
*There are more cool tools to add to the list? Tell us about it.