Read CSV files using Apache Commons CSV

Introduction

Apache Commons CSV is one of the components in the Apache Commons project. We can use Apache Commons CSV to read and write Comma Separated Value (CSV) files. It makes working with CSV files easier when compared to using bare bones file reading mechanism in Java. In this post, we will see how to read CSV files using Apache Commons CSV. We will also have a look at some of the configuration option it provides.

Importing Apache Commons CSV

If you are using Maven, use the following to import Apache Commons CSV into your project. (Replace version with the latest version available at that time).

<dependency>
  <groupId>org.apache.commons</groupId>
  <artifactId>commons-csv</artifactId>
  <version>1.6</version>
</dependency>

For gradle,

compile "org.apache.commons:commons-csv:1.6"

A simple CSV Reader

The below (simple) CSV file contains details about students – their id, name and GPA

100,John Doe,4.5
101,Mark Cooper,3.8
102,John Smith,4.1

Let us dive straight in to read this CSV file.

package com.javadevcentral;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;

import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URISyntaxException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class SimpleCSVReader {
    public static void main(String[] args) throws IOException, URISyntaxException {
        String csvFilePath = “…/csvFile.csv";
        Reader reader = Files.newBufferedReader(Paths.get(csvFilePath));

        CSVParser csvParser = CSVFormat.DEFAULT.parse(reader);
        csvParser.getRecords()
                .forEach(csvRecord -> System.out.println("Id: " + csvRecord.get(0) + " Name: " + csvRecord.get(1)
                        + " GPA: " + csvRecord.get(2)));
    }
}

We first create a Reader using newBufferedReader method in the Files class. This method removes the burden of creating the BufferedReader ourselves. The main part here is the CSVFormat. We use the DEFAULT CSVFormat here. It uses comma as the delimiter and CRLF (\r\n) as the record delimiter.
We then create a CSVParser by calling parse method on the CSVFormat. Once we have a CSVParser we can get the records by calling the getRecords method which returns a List<CSVRecord>. A CSVRecord represents one row in the CSV file. We can get the individual fields by using the index (csvRecord.get(0) returns the first column – the student id and so on).

Another way to access the records is to use an enhanced-for loop. This is possible since the CSVParser class implements Iterable<CSVRecord>.

for (CSVRecord csvRecord : csvParser) {
    System.out.println("Id: " + csvRecord.get(0) + " Name: " + csvRecord.get(1)
            + " GPA: " + csvRecord.get(2));
}

Reading CSV file with headers

The CSV file can have headers as the first record. Adding headers to our previous example, looks like,

Id,Name,GPA
100,John Doe,4.5
101,Mark Cooper,3.8
102,John Smith,4.1

If we attempt to read this file using the previous code, it would treat the header row just as any other row. Hence, we need to configure the CSVFormat specifying the first row as the header row. 

package com.javadevcentral;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;

import java.io.IOException;
import java.io.Reader;
import java.net.URISyntaxException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class CSVReaderWithHeaders {
    public static void main(String[] args) throws IOException, URISyntaxException {
        String csvFilePath = “../csvFileWithHeaders.csv";
        Reader reader = Files.newBufferedReader(Paths.get(csvFilePath));

        CSVFormat csvFormat = CSVFormat.DEFAULT
                .withFirstRecordAsHeader();
        CSVParser csvParser = csvFormat.parse(reader);

        System.out.println(csvParser.getHeaderMap()); //{Id=0, Name=1, GPA=2}
        csvParser.getRecords()
                .forEach(csvRecord -> System.out.println(csvRecord.toMap()));
    }
}

prints,

{GPA=4.5, Id=100, Name=John Doe}
{GPA=3.8, Id=101, Name=Mark Cooper}
{GPA=4.1, Id=102, Name=John Smith}

We still are using the CSVFormat with DEFAULT configuration, but we have called the withFirstRecordAsHeader method on it (which returns a new CSVFormat). The CSVFormat here uses a Builder pattern.
We can verify that the CSV parser has correctly picked up the header record using the getHeaderMap method on the CSVParser. It prints the header/column names mapped to the column order (as shown in the comment in the code).
Now, we can use the toMap method of the CSVRecord to print it. It automatically maps the column values to the column name (as can be seen from the above output).
We can also get specific column values by the column name. To get the value of id (of a row), we can call csvRecord.get(“Id”). Similarly, calling csvRecord.get(“Name”) and csvRecord.get(“GPA”) will return the student name and GPA for any CSVRecord.

for (CSVRecord csvRecord : csvParser) {
    System.out.println("Id: " + csvRecord.get("Id"));
    System.out.println("Name: " + csvRecord.get("Name"));
    System.out.println("GPA: " + csvRecord.get("GPa"));
}

Custom headers

We can configure the CSVFormat to use custom headers and not use the headers in the CSV file. An advantage is that the names can be configured at runtime.

package com.javadevcentral;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;

import java.io.IOException;
import java.io.Reader;
import java.nio.file.Files;
import java.nio.file.Paths;

public class CSVRecordWithCustomHeaders {
    public static void main(String[] args) throws IOException {
        String csvFilePath = "../csvFileWithHeaders.csv";
        Reader reader = Files.newBufferedReader(Paths.get(csvFilePath));

        CSVFormat csvFormat = CSVFormat.DEFAULT
                .withFirstRecordAsHeader()
                .withHeader("StudentId", "StudentName", "StudentGPA");
        CSVParser csvParser = csvFormat.parse(reader);

        System.out.println(csvParser.getHeaderMap()); //{StudentId=0, StudentName=1, StudentGPA=2}
        System.out.println();
        csvParser.getRecords()
                .forEach(csvRecord -> System.out.println(csvRecord.toMap()));

    }
}

outputs,

{StudentName=John Doe, StudentGPA=4.5, StudentId=100}
{StudentName=Mark Cooper, StudentGPA=3.8, StudentId=101}
{StudentName=John Smith, StudentGPA=4.1, StudentId=102}

We pass three header values in the withHeader method when building the CSVFormat object.
Note, with this, the getHeaderMap no longer prints the header values in the CSV file. But, still we have to configure the first record as the header(withFirstRecordAsHeader), but the parser does not use those values. If we do not configure this, then the CSV parser will return the first header record in the CSV file as a data record.
In this setup, to access a column value by the column/header name, we have to use the header value that we configured. To get the student name, we have to use csvRecord.get(“StudentName”) and not csvRecord.get(“Name”). The latter will result in the below exception

java.lang.IllegalArgumentException: Mapping for Name not found, expected one of [StudentId, StudentName, StudentGPA]

CSV files with different delimiter

If the CSV file uses a different delimiter, and not the default delimiter (which is comma), it is easy to configure the CSVFormat to handle it.
Say the file is delimited with hyphen,

package com.javadevcentral;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;

import java.io.IOException;
import java.io.Reader;
import java.nio.file.Files;
import java.nio.file.Paths;

public class CSVReaderWithDifferentDelimiter {
    public static void main(String[] args) throws IOException {
        String csvFilePath = "../csvFileWithDiffDelimiter.csv";
        Reader reader = Files.newBufferedReader(Paths.get(csvFilePath));

        CSVFormat csvFormat = CSVFormat.newFormat('-')
                .withFirstRecordAsHeader();
        CSVParser csvParser = csvFormat.parse(reader);

        csvParser.getRecords()
                .forEach(csvRecord -> System.out.println(csvRecord.toMap()));

    }
}

Note that we are not using the DEFAULT CSVFormat anymore and are constructing one from scratch. We pass the delimiter as the newFormat method argument. The rest of the configuration like record separator, escape sequence etc will be defaulted.

Conclusion

We have seen how to read CSV files using Apache Commons CSV. We also had a look at various configuration options it provides which includes reading CSV files with headers, using custom headers and reading CSV files with a different delimiter. These are the most common configurations you will need to know, but no need to stop here. You can explore the other methods in the CSVFormat class.
Thanks for reading. Please leave a comment and share if you liked my post.

References

Leave a Reply