Many companies use MS Access even until now. So, from time to time, someone want to connect to an ancient database. Moreover they want to extract data from there and to make this within standard Unix-based server. Of course, this data needed for an web application. And therefore someone want to use data from MS Access together with data from modern DB.
So, I want to describe several approach to creature called MS Access. Therefore, initial task is follows: to establish connect from Ruby on Rails application or from PostgreSQL (using FDW) to MS Access database and be able to extract data, preferable, in real time.
Here I'll try to aggregate all information about this problem and try to describe unobvious cases. Hope, this will save time for someone.In the beginning I describe a simple working solution. This works for Ubuntu
14.04 ( and must works for most other Unix system) and not required any special
knowledge, skills or any other magic.
This is mdbtools:
sudo apt-get install mdbtools
More details about dependency and about installation process can be found on
GitHub page of mdbtool.
This package provide lots of different tools for working with MS Access. For current purpose required mdb-export. This can convert mdb files into csv:
mdb-export 'mdb-file' 'table-name' > result.csv
As a result generated csv file with data from table with specified name.
Then csv file can be processed with wide range of tools. Because csv is extremely simple.
This is more difficult task. For this purpose required additional packages
and configuration.
First of all, we need ODBC. This is standard API for accessing database management systems (DMBS). For Unix this is unixODBC.
Installation is simple:
sudo apt-get install unixodbc libmdbodbc1
Second package contain libmdbodbc.so. This needs later for driver configuration.
Next step concluded into search an ODBC-driver for MS Access. Closest one is the previously described mdbtools.
After installation we need to configure ODBC-driver. Write into /etc/odbcinst.ini follows:
[MDBTools]
Description = MDBTools Driver Driver = libmdbodbc.so Setup = libmdbodbc.so FileUsage = 1 UsageCount = 1
Then comes the turn of database configuration. In file /etc/odbc.ini describe all your databases. Description for single DB is follows:
[testdb]
Description = test
Driver = MDBTools
Database = /opt/db/MS_Access.mdb
Here in 'Driver' argument specified previously configured ODBC-driver name.
More about odbcinst.ini and odbc.ini can be found here.
So, config is ready. Now we can run utility from unixODBC which called isql to submit SQL queries to a data source.
$ isql testdb
After this must appear prompt for queries:
SQL> SELECT * from "Раздел" +------------+-----------------------------------------------------------------------------------------------------+ | Код | Раздел | +------------+-----------------------------------------------------------------------------------------------------+ | 1 | Документация | | 2 | Сборочные единицы | | 3 | Детали | | 4 | Комплекты | +------------+-----------------------------------------------------------------------------------------------------+ SQLRowCount returns 4 4 rows fetched
Mention: there is an unicode version isql called iusql.
Tool called isql is very squeamish and unfriendly. There is lots of limitation on syntax. For example, if you end a command with semicolon you get an error. Moreover there is no way to write multiline queries, no hints or tips with error localization, no normal documentation in case of working with nonstandard table/column names and so on. Therefore there is an simple shell called pyodbc-cli. This can help to evade some inconvenience of isql and concentrate on querying, not syntax.
There is rumors about using 'Changset' argument in case of usage uncommon code page. Here is example of /etc/odbc.ini this this options:
[testdb]
Description = test
Driver = MDBTools
Database = /opt/db/MS_Access.mdb
Charset = CP1251
For me this argument do nothing. In isql console I can work with both: Unicode and CP1251 mdb files regardless of 'Changset' option. However, in iusql I obtain lots of question marks (i.e. something like this: ��������) again regardless of 'Changset' option.
Alternative variant of isql console is a mdb-sql from mdbtools package. For this tool no needs to configure /etc/odbcinst.ini and /etc/odbc.ini. Console run for certain mdb file through following command:
mdb-sql /opt/db/MS_Access.mdb
All other description can be found in man page. Single exception: I can't execute query over DB with Russian table names using mdb-sql. For standard DB with unicode all works fine.
For now (mid 2016), last release of MS Access were in September 22, 2015. But any works with adapters for ActiveRecord ended near 2008 year. So, lets take a look on available variants.
First of all, good news: there is odbc-rails and it reincarnation activerecord-odbc-adapter.
And then immediately bad news:
Other way is a ruby-odbc. This is an ODBC binding for Ruby.
Last update of gem were in 2011 year. But more or less it works. For install gem do follows in terminal:
sudo apt-get install unixodbc unixodbc-dev
gem install ruby-odbc
Without package unixodbc-dev you obtain error: ERROR: sql.h not found.
Then, in assumption that you have configured odbcinst.ini and
odb.ini you can do follows in ruby console:
001 > require 'odbc' => true 002 > client = ODBC.connect("testdb") => #<odbc::database:0x00000000e38d98> 003 > statement = client.prepare 'SELECT * FROM "Раздел"' => #<odbc::statement:0x00000000e11608 @_a="[]," @_h="{}," @_c0="{}," @_c1="{}," @_c2="{}," @_c3="{}"> 004 > statement.execute => #<odbc::statement:0x00000000e11608 @_a="[]," @_h="{}," @_c0="{}," @_c1="{}," @_c2="{}," @_c3="{}"> 005 > first_row = statement.fetch => [1, "\xD0\x94\xD0\xBE\xD0\xBA\xD1\x83\xD0\xBC\xD0\xB5\xD0\xBD\xD1\x82\xD0\xB0\xD1\x86\xD0\xB8\xD1\x8F\x00"] 006 > first_row[1].force_encoding("utf-8") => "Документация\u0000"
This works in the same way as isql.
More details about syntax and available command of ruby-odbc gem can be
found in ruby-odbc/test
folder on GitHub.
This gem provide a DSL for working with mdb files. And this interface looks nice. But gem is just a wrapper for previously described mdbtools. I.e. under the hood there is same convert operation from mdb into cvs and further processing of csv within RAM.
There is a commercial Access ODBC Driver. But there is less information about this. In optimistic case this adapter can help with advanced queries over Access. I.e. maybe, this provide full list of statements (LIMIT, AS, GROUP etc) instead of reduced one in libmdbodbc.so. But this is only a guess. I don't know what happens in fact. Moreover, there is no any information and/or user opinion about this adapter. The single advantage is a 14 days trial after registration. This provide time and ability to understand that is the adapter in reality.
For PotgreSQL exist an extension OGR. This extension is a part of GDAL. Which is a translator library for raster and vector geospatial data formats. I.e. this is huge library and it provide ability to work with lots of formats. But we interested only in one: MDB.
In the beginning, this PG extensions depend from several packages. Here it is:
sudo apt-get install gdal-bin libgdal-dev
sudo apt-get install postgis postgresql-9.3-postgis-2.1
This command install tons of related packages and most of them is useless for current target. First one is for ogr_fdw extension, second one is for postgis.
Step two: compile from source pgsql-ogr-fdw. Here is a small manual:
git clone git@github.com:pramsey/pgsql-ogr-fdw.git cd pgsql-ogr-fdw sudo apt-get install postgresql-server-dev-9.3 sudo apt-get install checkinstall make sudo checkinstall
Within checkinstall edit parameter 'version'. Input random digit separated with dot (for example: '0.1.0'). Otherwise occur an error during package generation.
Step three: go to psql console and install extensions:
CREATE EXTENSION ogr_fdw; CREATE EXTENSION postgis;
Step four: time to create FDW. In org_fwd exists two ways (or formats
in OGR terminology) for connect to MS Access. First way: using standard ODBC.
I.e. using database described in /etc/odbc.ini within specified adapter.
Details about this approach can be found here. Second way: use MDB format from
ORG, provided direct connection to mdb file using Jackcess. Details about MDB format
located here. Below I describe
both ways. One more thing: ORG is extremely powerful tools; an ability to work
with MS Access is a small piece of a huge amount of formats; full list of
supported formats presents on GDAL official site.
This approach use system ODBC and works in similar way as previously described isql and ruby-odbc, but within database. All available options for initialize FDW presents on GDAL ODBC driver page. Below I show only simple example of usage.
So, here an example of usage in format of small piece of pgSQL with commands output:
postgres=# CREATE SERVER testdb_access postgres-# FOREIGN DATA WRAPPER ogr_fdw postgres-# OPTIONS( postgres(# datasource 'ODBC:testdb', postgres(# format 'ODBC'); CREATE SERVER postgres=# CREATE FOREIGN TABLE access_sections ( postgres(# "Код" decimal, postgres(# "Раздел" varchar) postgres-# SERVER testdb_access postgres-# OPTIONS (layer 'Раздел'); CREATE FOREIGN TABLE postgres=# SELECT * FROM access_sections; ERROR: unable to connect to layer to "Раздел" HINT: Does the layer exist?
As I understand from ORG docs, layer in terms of ORG is same thing as table in terms of database.
List of layers can be obtained by ogrinfo tool:
$ ogrinfo -al 'ODBC:testdb' geometry_columns is not a table in this database Got no result for 'SELECT f_table_name, f_geometry_column, geometry_type FROM geometry_columns' command INFO: Open of `ODBC:testdb' using driver `ODBC' successful.
Based on this message, all works fine, but database (i.e. in mdb file) doesn't contain Geo-format and ORG stuck on finding geometry_columns table. I can't find how to evade this. But someone wrote what this approach fine works under Windows. So, if you know how to get it work with any nongeo mdb, please, say about this (for example in comments).
Another question is how PG and FDW works with nonlatin (Russian in my case) names of tables and columns from mdb. From one side: Postgres is indifferent to table/column names. Wrap them in double quotes and use anything including special characters. But I don't know is this applied to FDW.
This approach based on pure Java library called Jackcess. Due to Java this approach has no relations with system ODBC and has own driver and connection mechanism.
I spent 3 days and ~20 recompilation of package to reach a working variant. So, I want to say about some limitation on below description:
this method can help only with mdb files without encryption (i.e. without password);
Be very careful, read all related docs and be patient.
All below description is extended version of official GDAL ACCESS MDB database driver.
Step one: install openjdk-6-jdk.
sudo apt-get install openjdk-6-jdk
Based on source code of GDAL, it support openjdk-7-jdk too. But I can't build worked package with 7 version.
Then required libgdal-dev package.
sudo apt-get install libgdal-dev
Remember version of installed package. This version related with main GDAL package. For me this is 1.10.1 version.
Important moment: MDB support appears in 1.9.0 version and later.
Finally, remove gdal-bin:
sudo apt-get remove gdal-bin
Step two: download jackcess-1.2.2.jar, commons-lang-2.4.jar and commons-logging-1.1.1.jar and put the 3 JARs in the lib/ext. In my case this is /usr/lib/jvm/java-6-openjdk-amd64/jre/lib/ext. Required versions of JAR dependencies can be found in this utility. For me, all works with any latest version of commons-logging (1.*), with any other minor version of commons-lang (2.*) and jackcess (1.*). Errors appear only in case of new major version of jaccess (2.1.4).
Step three: download and configure GDAL.
git clone git@github.com:OSGeo/gdal.git
cd gdal/gdal/
git checkout 1.10
Here needs to move to branch with same version as version of installed libgdal-dev. Otherwise lib and binary become incompatible.
Then configure. There is two variant. Simple:
./configure --with-java=yes --with-jvm-lib-add-rpath=yes --with-mdb=yes
and with explicit path
./configure --with-java=/usr/lib/jvm/java-6-openjdk-amd64 \ --with-jvm-lib=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server \ --with-jvm-lib-add-rpath=yes \ --with-mdb=yes
Second variant can help in case if multiple installed version of Java. For example, in case of installed openjdk-6-jdk and openjdk-7-jdk.
In the end of configure output must appear 'yes' in front of MDB support.
Step four: find cup of tea and run compilation.
sudo checkinstall
Go through wizard. Specify 'description' and 'version' for package and wait ~10min.
As result must appear package with size ~300Mb. If you want to reduce size, you can manually configure package, remove all redundant parts and make it more light. But this actions is beyond of scope of this article and hence will be skipped.
Step five: if you obtain any error during compilation - solve them using Google and own knowledge.
Step six: run follows command:
$ ogrinfo --formats | grep MDB
-> "MDB" (readonly)
If all goes right, in output must appear information about MDB format.
Otherwise go back, read mans, reconfigure and recompile GDAL and try again.
Now ogrinfo can work with mdb files and provide content and information about them.
$ ogrinfo /opt/db/test-database.mdb INFO: Open of `/opt/db/test-database.mdb' using driver `MDB' successful. 1: closeouts 2: economics
Step seven: time to configure FDW in PotsagreSQL. Here is example of initialization as pgSQL script with commands output:
postgres=# CREATE SERVER acc FOREIGN DATA WRAPPER ogr_fdw OPTIONS ( datasource '/opt/db/test-database.mdb', format 'MDB' ); CREATE SERVER postgres=# CREATE FOREIGN TABLE economics( ID integer) SERVER acc OPTIONS(layer 'economics'); CREATE FOREIGN TABLE postgres=# SELECT * FROM economics; id ---- 1 2 3 4 5 (5 rows)
So, thats all. In the end of this section I want to say about encrypted mdb files.
If ogrinfo say something like this:
Exception in thread "main" com.healthmarketscience.jackcess.UnsupportedCodecException: Decoding not supported. Please choose a CodecProvider which supports reading the current database encoding. at com.healthmarketscience.jackcess.DefaultCodecProvider$UnsupportedHandler.decodePage(DefaultCodecProvider.java:115)
most likely your mdb file is encrypted. In this case you can look at
Jaccess FAQ and
think about contribution in OGR Access driver. As I understand, there is an
project called Jackcess
Encrypt. This project provide CryptCodecProvider, which implements the
Jackess CodecProvider interface and supports some forms of Access database
encryption. But implementation of Access driver from GDAL doesn't use
Jackcess Encrypt and hence doesn't support encrypted files. So, you can
improve GDAL in this direction.
Working with Access is painful. In double size in case of Unix environment. So, first advice: extract data from MS Access into any modern database and you will get rid of many problems. If no way to drop out MS Access - use Windows for dealing with it. Microsoft provide drivers and tools for dealing with own products. Otherwise you have two ways: use transition of data through CSV files or direct access to MS Access. First way is simple and works "out of the box". Second way is quite difficult, require a lot of time and has set of limitations, pitfalls and other unpleasant things. So, select wisely.
Russian variant of this article located on HabraHabr.
Thanks to Roman Shipiev for help with initial exploration of MS Access and for description of several related tools.
MS Access PostgreSQL Ruby on Rails Foreign data wrapper pgsql RoR Ruby mdb csv FDW GDAL OGR ODBC Ubuntu