cdat_lite is born
After much tinkering cdat_lite is ready for a wider audience. Primarily developed for using as a component of the NERC Data Grid, cdat-lite is a simple repackaging of the i/o layer of the Climate Data Analysis Tools (CDAT) as a Python Egg.
The BADC has found CDAT’s data management layer (CDMS) invaluable in developing server-side analysis tools. It handles the sorts of calendars only found in numerical modelling, abstracts away NetCDF coordinate variables on a veriety of grids and allows aggregation of huge multi-file datasets into a logical dataset. We like it so much that we developed an input layer for the UK Met. Office PP format.
However CDAT aspires to be much more than this. It is a comprehensive data analysis environment with a GUI and visualisation components and as such has grown into rather a heafty package. It’s a 160Mb download from sourceforge including, among other things, it’s own Python tarball. This can be rather inconvenient if you have your own personalised Python environment. Non-default installation can be tricky and time consuming — not too much of a problem when setting up a single workstation but more arduous at a place such as the BADC where we have quite a heterogeneous network of Linux systems. Add to this an evolving codebase, as the cdunifpp component has matured, and you have a recipe for multiple installations, most of which are out of date.
cdat_lite tries to fix this by taking out the bits of CDAT we find most useful — the libcdms I/O layer and the core python packages cdms, cdutil, cdtime, unidata, genutil, regrid and xmgrace
The first cut of cdat_lite was relatively easy to create. To eggify the CDAT packages only required a new setup.py script and a single patch to unidata to load a datafile with the pkg_resources API. Similarly libcdms had it’s own configure script and Makefile that could be called from setup.py. I could build binary eggs for x86_64 and i686 from within my sandbox that seemed to work once deployed. The fun came when I became fussier about how easy it should be to install. I wanted the tarball to work with easy_install as well as the pre-built eggs. I wanted this to work on as many machines at the BADC as possible. In particular RedHat Enterprise, SUSE 10 i686 and x86_84 and kubuntu i686. This opened up several windy roads.
Dependencies 1: Numeric
CDAT needs Numeric. It needs the Numeric header files to compile libcdms. Numeric is quite often present on the target python installation but not always. When it is installed it may or may not include the Numeric headers and probably doesn’t have an EGG-INFO directory (therefore it isn’t detected by setuptools). Numeric isn’t on the cheeseshop and isn’t easily downloadable from sourceforge because it a legacy package.
At first I just said you needed Numeric before you begin but this didn’t seem a very good advert for the ease of using eggs. If Joe user had a Numeric installation without the header files he’d be stuck. In the end I decided to mirror the Numeric tarball on the cdat_lite site and include it as a dedendency. This means Numeric will be downloaded and built automatically be easy_install. You might end up installing Numeric twice (one egg, one not) but you’ll always have the headers.
Sorting this out was a big learning experience for me in how to use setuptools or, more precisely, how to extend distutils in a setuptools compatible way. The location of the Numeric headers can be determined by:
>>> from Numeric_headers import get_numeric_include >>> get_numeric_include() '/usr/lib/python2.4/site-packages/Numeric-24.2-py2.4-linux-i686.egg/Numeric_headers'
If Numeric isn’t installed you can tell setuptools to install it with the setup_requires keyword. The problem is that in this case Numeric will be installed from within setup. Therefore, you can’t import Numeric_headers in setup.py because it might not be installed yet.The solution is to subclass setuptools.command.build_ext, adding get_numeric_include() to self.include_dirs. This way you can still add your own include_dirs on the command line or in a configuration file. It’s a pity it’s so difficult to work out how to do this without reading the distutils source.
Dependencies 2: NetCDF
Again netcdf is usually installed but not always. An added complication is that on x86_64 the library must be compiled with the -fPIC options for it to be usable within a DSO. I discovered that on some of our x86_64 machines we had a netcdf installation that seemed fine (libnetcdf.a, ncdump and ncgen present) but it was useless for building python extension modules.
Clearly there was a case for cdat_lite building it’s own libnetcdf.a. The current version includes the netcdf tarball although it could download it from unidata in the future. When compiled, the egg includes it’s own copy of the netcdf libraries and headers. This will allow future eggified CDAT components (vcs_lite is in the works) to compile and link against a consistent netcdf installation.
x86_64 compatability
There is a cautionary tale about the virtue of unit tests here. I was quite pleased with how easy it was to compile a x86_64 egg. The modules imported fine and I tested reading a PP file. I publicised cdat_lite at the BADC and moved on. When I returned to polish the code I built a hand full of trivial unit tests, including reading a NetCDF file. I then discovered that it couldn’t read NetCDF on x86_64!
It turns out that CDAT 4.1 isn’t compatible with x86_64. There is an important patch in the CDAT SVN which was obviously going to fix this so I merged in the SVN trunk. cdat_lite is now based on a particular revision of the CDAT SVN. This isn’t ideal but is better than not having NetCDF.
Future
I hope vcs_lite will be ready for registering at the cheeseshop soon. It will provide the vcs canvas and hardcopy output without any Tk widgets or VCDAT. This is all we need to building web applications based on CDAT.
It would be fun to try and write a pydap plugin on top of of cdat_lite. It looks easy enough (famous last words …)
No comments
Jump to comment form | comments rss [?] | trackback uri [?]