I recently needed to add support for extracting data from Excel files (which are still ubiquitous in labs) to a Java application.
First I tried using POI-HSSF (a well-known Apache project). The code is reasonably straightforward – though I had to get it from tutorials on the Web as I couldn’t find much in terms of documentation on their site, and the API isn’t exactly self-evident:
POIFSFileSystem fs = new POIFSFileSystem(in);
HSSFWorkbook wb = new HSSFWorkbook(fs);
HSSFSheet sheet = wb.getSheetAt(0);
Iterator rows = sheet.rowIterator();
while (rows.hasNext()) {
HSSFRow row = (HSSFRow) rows.next();
Iterator cells = row.cellIterator();
List<string> fields = new ArrayList<string>();
while (cells.hasNext()) {
HSSFCell cell = (HSSFCell) cells.next();
switch (cell.getCellType()) {
case HSSFCell.CELL_TYPE_STRING:
fields.add(cell.getStringCellValue());
break;
case HSSFCell.CELL_TYPE_NUMERIC:
fields.add(Double.toString(cell.getNumericCellValue()));
break;
default:
log.warn("Unsupported cell type: " + cell.getCellType());
break;
}
}
process(fields);
}
This works fine–except that there doesn’t seem to be a way to keep the precision of numbers, i.e. if you don’t want to convert “4” to “4.0”! Fortunately POI isn’t the only game in town. I ended up using JExcel API which has a much simpler API (and behaves the way I want it to):
Workbook workbook = Workbook.getWorkbook(in);
Sheet sheet = workbook.getSheet(0);
for (int i = 0; i < sheet.getRows(); ++i) {
List<string> fields = new ArrayList<string>();
for (int j = 0; j < sheet.getColumns(); ++j) {
fields.add(sheet.getCell(j, i).getContents());
}
process(fields);
}