How to work with MS Access under Unix


Introduction

Many companies use MS Access even until now. So, from time to time, someone want to connect to an ancient database. Moreover they want to extract data from there and to make this within standard Unix-based server. Of course, this data needed for an web application. And therefore someone want to use data from MS Access together with data from modern DB.

So, I want to describe several approach to creature called MS Access. Therefore, initial task is follows: to establish connect from Ruby on Rails application or from PostgreSQL (using FDW) to MS Access database and be able to extract data, preferable, in real time.

Here I'll try to aggregate all information about this problem and try to describe unobvious cases. Hope, this will save time for someone.

Conversion into CSV

In the beginning I describe a simple working solution. This works for Ubuntu 14.04 ( and must works for most other Unix system) and not required any special knowledge, skills or any other magic.

This is mdbtools:

sudo apt-get install mdbtools

More details about dependency and about installation process can be found on GitHub page of mdbtool.

This package provide lots of different tools for working with MS Access. For current purpose required mdb-export. This can convert mdb files into csv:

mdb-export 'mdb-file' 'table-name' > result.csv

As a result generated csv file with data from table with specified name.

Then csv file can be processed with wide range of tools. Because csv is extremely simple.

Querying over MS Access database

This is more difficult task. For this purpose required additional packages and configuration.

First of all, we need ODBC. This is standard API for accessing database management systems (DMBS). For Unix this is unixODBC.

Installation is simple:

sudo apt-get install unixodbc libmdbodbc1

Second package contain libmdbodbc.so. This needs later for driver configuration.

Next step concluded into search an ODBC-driver for MS Access. Closest one is the previously described mdbtools.

After installation we need to configure ODBC-driver. Write into /etc/odbcinst.ini follows:

[MDBTools]
Description = MDBTools Driver Driver = libmdbodbc.so Setup = libmdbodbc.so FileUsage = 1 UsageCount = 1

Then comes the turn of database configuration. In file /etc/odbc.ini describe all your databases. Description for single DB is follows:

[testdb]
Description = test
Driver = MDBTools
Database = /opt/db/MS_Access.mdb

Here in 'Driver' argument specified previously configured ODBC-driver name.

More about odbcinst.ini and odbc.ini can be found here.

So, config is ready. Now we can run utility from unixODBC which called isql to submit SQL queries to a data source.

$ isql testdb 

After this must appear prompt for queries:

SQL> SELECT * from "Раздел"
+------------+-----------------------------------------------------------------------------------------------------+
| Код        | Раздел                                                                                              |
+------------+-----------------------------------------------------------------------------------------------------+
| 1          | Документация                                                                                        |
| 2          | Сборочные единицы                                                                                   |
| 3          | Детали                                                                                              |
| 4          | Комплекты                                                                                           |
+------------+-----------------------------------------------------------------------------------------------------+
SQLRowCount returns 4
4 rows fetched

Mention: there is an unicode version isql called iusql.

Oddities of isql

Tool called isql is very squeamish and unfriendly. There is lots of limitation on syntax. For example, if you end a command with semicolon you get an error. Moreover there is no way to write multiline queries, no hints or tips with error localization, no normal documentation in case of working with nonstandard table/column names and so on. Therefore there is an simple shell called pyodbc-cli. This can help to evade some inconvenience of isql and concentrate on querying, not syntax.

Exotic encoding of tables/columns

There is rumors about using 'Changset' argument in case of usage uncommon code page. Here is example of /etc/odbc.ini this this options:

[testdb]
Description = test
Driver = MDBTools
Database = /opt/db/MS_Access.mdb
Charset = CP1251

For me this argument do nothing. In isql console I can work with both: Unicode and CP1251 mdb files regardless of 'Changset' option. However, in iusql I obtain lots of question marks (i.e. something like this: ��������) again regardless of 'Changset' option.

Alternative for isql

Alternative variant of isql console is a mdb-sql from mdbtools package. For this tool no needs to configure  /etc/odbcinst.ini and /etc/odbc.ini. Console run for certain mdb file through following command:

mdb-sql /opt/db/MS_Access.mdb

All other description can be found in man page. Single exception: I can't execute query over DB with Russian table names using mdb-sql. For standard DB with unicode all works fine.

Ruby/Rails approach

For now (mid 2016), last release of MS Access were in September 22, 2015. But any works with adapters for ActiveRecord ended near 2008 year. So, lets take a look on available variants.

First of all, good news: there is odbc-rails and it reincarnation activerecord-odbc-adapter.

And then immediately bad news:

  • the last update were in 2008 year;
  • I have no idea how to get work this adapter under Rails 3+. Because there is no appropriate documentation about installation and configuration of gem and related ODBC-driver. If you have enough time and skill you can submerge into this gem and try to get them work with ActiveRecord 3+. In this case - good luck!

Ruby-ODBC

Other way is a ruby-odbc. This is an ODBC binding for Ruby.

Last update of gem were in 2011 year. But more or less it works. For install gem do follows in terminal:

sudo apt-get install unixodbc unixodbc-dev
gem install ruby-odbc

Without package unixodbc-dev you obtain error: ERROR: sql.h not found.

Then, in assumption that you have configured odbcinst.ini and odb.ini you can do follows in ruby console:

001 > require 'odbc'
 => true
002 > client = ODBC.connect("testdb")
 => #<odbc::database:0x00000000e38d98>
003 > statement = client.prepare 'SELECT * FROM "Раздел"'
 => #<odbc::statement:0x00000000e11608 @_a="[]," @_h="{}," @_c0="{}," @_c1="{}," @_c2="{}," @_c3="{}"> 
004 > statement.execute
 => #<odbc::statement:0x00000000e11608 @_a="[]," @_h="{}," @_c0="{}," @_c1="{}," @_c2="{}," @_c3="{}">
005 > first_row = statement.fetch
 => [1, "\xD0\x94\xD0\xBE\xD0\xBA\xD1\x83\xD0\xBC\xD0\xB5\xD0\xBD\xD1\x82\xD0\xB0\xD1\x86\xD0\xB8\xD1\x8F\x00"] 
006 > first_row[1].force_encoding("utf-8")
 => "Документация\u0000" 

This works in the same way as isql.

More details about syntax and available command of ruby-odbc gem can be found in ruby-odbc/test folder on GitHub.

Mdb gem

This gem provide a DSL for working with mdb files. And this interface looks nice. But gem is just a wrapper for previously described mdbtoolsI.e. under the hood there is same convert operation from mdb into cvs and further processing of csv within RAM.

Alternative variant of adapter

There is a commercial Access ODBC Driver. But there is less information about this. In optimistic case this adapter can help with advanced queries over Access. I.e. maybe, this provide full list of statements (LIMIT, AS, GROUP etc) instead of reduced one in libmdbodbc.so. But this is only a guess. I don't know what happens in fact. Moreover, there is no any information and/or user opinion about this adapter. The single advantage is a 14 days trial after registration. This provide time and ability to understand that is the adapter in reality.

PostgreSQL approach

For PotgreSQL exist an extension OGR. This extension is a part of GDAL. Which is a translator library for raster and vector geospatial data formats. I.e. this is huge library and it provide ability to work with lots of formats. But we interested only in one: MDB.

Installation

In the beginning, this PG extensions depend from several packages. Here it is:

sudo apt-get install gdal-bin libgdal-dev
sudo apt-get install postgis postgresql-9.3-postgis-2.1

This command install tons of related packages and most of them is useless for current target. First one is for ogr_fdw extension, second one is for postgis.

Step two: compile from source pgsql-ogr-fdw. Here is a small manual:

git clone git@github.com:pramsey/pgsql-ogr-fdw.git
cd pgsql-ogr-fdw

sudo apt-get install postgresql-server-dev-9.3
sudo apt-get install checkinstall

make
sudo checkinstall

Within checkinstall edit parameter 'version'. Input random digit separated with dot (for example: '0.1.0'). Otherwise occur an error during package generation.

Step three: go to psql console and install extensions:

CREATE EXTENSION ogr_fdw;
CREATE EXTENSION postgis;

Step four: time to create FDW. In org_fwd exists two ways (or formats in OGR terminology) for connect to MS Access. First way: using standard ODBC. I.e. using database described in /etc/odbc.ini within specified adapter. Details about this approach can be found here. Second way: use MDB format from ORG, provided direct connection to mdb file using Jackcess. Details about MDB format located here. Below I describe both ways. One more thing: ORG is extremely powerful tools; an ability to work with MS Access is a small piece of a huge amount of formats; full list of supported formats presents on GDAL official site.

ORG ODBC format

This approach use system ODBC and works in similar way as previously described isql and ruby-odbc, but within database. All available options for initialize FDW presents on GDAL ODBC driver page. Below I show only simple example of usage.

So, here an example of usage in format of small piece of pgSQL with commands output:

postgres=# CREATE SERVER testdb_access
postgres-# FOREIGN DATA WRAPPER ogr_fdw
postgres-# OPTIONS(
postgres(#   datasource 'ODBC:testdb',
postgres(#   format 'ODBC');
CREATE SERVER
postgres=# CREATE FOREIGN TABLE access_sections (
postgres(#     "Код" decimal,
postgres(#     "Раздел" varchar)
postgres-#     SERVER testdb_access
postgres-#     OPTIONS (layer 'Раздел');
CREATE FOREIGN TABLE
postgres=# SELECT * FROM access_sections;
ERROR:  unable to connect to layer to "Раздел"
HINT:  Does the layer exist?

As I understand from ORG docs, layer in terms of ORG is same thing as table in terms of database.

List of layers can be obtained by ogrinfo tool:

$ ogrinfo -al 'ODBC:testdb'
geometry_columns is not a table in this database
Got no result for 'SELECT f_table_name, f_geometry_column, geometry_type FROM geometry_columns' command
INFO: Open of `ODBC:testdb'
      using driver `ODBC' successful.

Based on this message, all works fine, but database (i.e. in mdb file) doesn't contain Geo-format and ORG stuck on finding geometry_columns table. I can't find how to evade this. But someone wrote what this approach fine works under Windows. So, if you know how to get it work with any nongeo mdb, please, say about this (for example in comments).

Another question is how PG and FDW works with nonlatin (Russian in my case) names of tables and columns from mdb. From one side: Postgres is indifferent to table/column names. Wrap them in double quotes and use anything including special characters. But I don't know is this applied to FDW.

ORG MDB format

This approach based on pure Java library called Jackcess. Due to Java this approach has no relations with system ODBC and has own driver and connection mechanism.

I spent 3 days and ~20 recompilation of package to reach a working variant. So, I want to say about some limitation on below description:

  • this method can help only with mdb files without encryption (i.e. without password);

  • due to custom package compilation all specified version, paths and dependencies can change based on environment (OS type and version, installed packages, environment variables etc).

Be very careful, read all related docs and be patient.

All below description is extended version of official GDAL ACCESS MDB database driver.

Step one: install openjdk-6-jdk.

sudo apt-get install openjdk-6-jdk

Based on source code of GDAL, it support openjdk-7-jdk too. But I can't build worked package with 7 version.

Then required libgdal-dev package.

sudo apt-get install libgdal-dev

Remember version of installed package. This version related with main GDAL package. For me this is 1.10.1 version.

Important moment: MDB support appears in 1.9.0 version and later.

Finally, remove gdal-bin:

sudo apt-get remove gdal-bin

Step two: download jackcess-1.2.2.jar, commons-lang-2.4.jar and commons-logging-1.1.1.jar and put the 3 JARs in the lib/ext. In my case this is /usr/lib/jvm/java-6-openjdk-amd64/jre/lib/ext. Required versions of JAR dependencies can be found in this utility. For me, all works with any latest version of commons-logging (1.*), with any other minor version of commons-lang (2.*) and jackcess (1.*). Errors appear only in case of new major version of jaccess (2.1.4).

Step three: download and configure GDAL.

git clone git@github.com:OSGeo/gdal.git
cd gdal/gdal/
git checkout 1.10

Here needs to move to branch with same version as version of installed libgdal-dev. Otherwise lib and binary become incompatible.

Then configure. There is two variant. Simple:

./configure --with-java=yes --with-jvm-lib-add-rpath=yes --with-mdb=yes

and with explicit path

./configure --with-java=/usr/lib/jvm/java-6-openjdk-amd64 \
            --with-jvm-lib=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server \
            --with-jvm-lib-add-rpath=yes \
            --with-mdb=yes

Second variant can help in case if multiple installed version of Java. For example, in case of installed openjdk-6-jdk and openjdk-7-jdk.

In the end of configure output must appear 'yes' in front of MDB support.

Step four: find cup of tea and run compilation.

sudo checkinstall

Go through wizard. Specify 'description' and 'version' for package and wait ~10min.

As result must appear package with size ~300Mb. If you want to reduce size, you can manually configure package, remove all redundant parts and make it more light. But this actions is beyond of scope of this article and hence will be skipped.

Step five: if you obtain any error during compilation - solve them using Google and own knowledge.

Step six: run follows command:

$ ogrinfo --formats | grep MDB
-> "MDB" (readonly)

If all goes right, in output must appear information about MDB format. Otherwise go back, read mans, reconfigure and recompile GDAL and try again.

Now ogrinfo can work with mdb files and provide content and information about them.

$ ogrinfo /opt/db/test-database.mdb
INFO: Open of `/opt/db/test-database.mdb'
      using driver `MDB' successful.
1: closeouts
2: economics

Step seven: time to configure FDW in PotsagreSQL. Here is example of initialization as pgSQL script with commands output:

postgres=# CREATE SERVER acc
    FOREIGN DATA WRAPPER ogr_fdw
    OPTIONS (
      datasource '/opt/db/test-database.mdb',
      format 'MDB' );
CREATE SERVER
postgres=# CREATE FOREIGN TABLE economics(
    ID integer)
    SERVER acc
    OPTIONS(layer 'economics');
CREATE FOREIGN TABLE
postgres=# SELECT * FROM economics;
 id 
----
  1
  2
  3
  4
  5
(5 rows)

So, thats all. In the end of this section I want to say about encrypted mdb files.

If ogrinfo say something like this:

Exception in thread "main" com.healthmarketscience.jackcess.UnsupportedCodecException: Decoding not supported.
Please choose a CodecProvider which supports reading the current database encoding.
        at com.healthmarketscience.jackcess.DefaultCodecProvider$UnsupportedHandler.decodePage(DefaultCodecProvider.java:115)

most likely your mdb file is encrypted. In this case you can look at Jaccess FAQ and think about contribution in OGR Access driver. As I understand, there is an project called Jackcess Encrypt. This project provide CryptCodecProvider, which implements the Jackess CodecProvider interface and supports some forms of Access database encryption. But implementation of Access driver from GDAL doesn't use Jackcess Encrypt and hence doesn't support encrypted files. So, you can improve GDAL in this direction.

Other FDWs

List of other available FDWs can be found on official PG-wiki. There you can find ZhengYang/odbc_fdw with last changes in 2011 and note "Does not compile with PostgreSQL >= 9.2!". And CartoDB/odbc_fdw with active development process and support Postgres 9.5+.

Conclusion

Working with Access is painful. In double size in case of Unix environment. So, first advice: extract data from MS Access into any modern database and you will get rid of many problems. If no way to drop out MS Access - use Windows for dealing with it. Microsoft provide drivers and tools for dealing with own products. Otherwise you have two ways: use transition of data through CSV files or direct access to MS Access. First way is simple and works "out of the box". Second way is quite difficult, require a lot of time and has set of limitations, pitfalls and other unpleasant things. So, select wisely.

Links

Translations

Russian variant of this article located on HabraHabr.

Thanks

Thanks to Roman Shipiev for help with initial exploration of MS Access and for description of several related tools.

MS Access PostgreSQL Ruby on Rails Foreign data wrapper pgsql RoR Ruby mdb csv FDW GDAL OGR ODBC Ubuntu

comments powered by Disqus