This lesson is still being designed and assembled (Pre-Alpha version)

How to use csvkit

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • How to open data with csvkit?

  • How to select certain rows and columns of the data? How to append them after filtering?

  • How to sort and describe basic characteristics of the data?

Objectives
  • Learn how to install csvkit and how to use csvlook

  • Learn csvgrep, csvcut and csvstack commands

  • Learn csvsort and csvstat commands

The use of csvkit

csvkit is a command-line tool written in Python to be used for simple data wrangling and analysis tasks. This tutorial presents the most important commands implemented in it. The following sections rely heavily on the official csvkit tutorial.

Installing csvkit

The csvkit tool can be installed with the following command (if you use Python 2.7 you might type sudo pip install csvkit instead).

$ sudo pip3 install csvkit

For illustration purposes an example dataset is also used in this tutorial. The data contain information on cars and their characteristics. To get the data you should type the following command. The dataset has a second row with information on data type that is removed for later analysis purposes with the head and tail commands - an alternative way to do this is by using sed 2,2d cars.csv > cars-tutorial.csv.

$ wget https://perso.telecom-paristech.fr/eagan/class/igr204/data/cars.csv
$ head -1 cars.csv > cars-tutorial.csv
$ tail -n+3 cars.csv >> cars-tutorial.csv

The most important csvkit commands

The example dataset is semi-colon and not comma separated. For all the commands presented below the input delimiter can be set with the -d argument: in this case as -d ";". Setting the input delimiter with -d changes the decimal separator in the ouput as well. To change it back to dot from comma, csvformat -D "." should be used after any command where it is relevant.

Useful resources for learning csvkit:

Key Points