I've had some time over the last couple of evenings to do some coding, so I've made a start on my package manager. So far I'm calling it NPackage.
I defined a Yaml-format package file, containing the package name, description, version number; the name and email address of the maintainer; and an optional field for the URL of the download site. The package file then contains a list of libraries (i.e. .NET assembles) contributed by the package, with download URLs specified relative to the download site.
The idea behind making the download URL optional is that these package files can point to an existing download site for established libraries like NUnit. If the download URL is omitted then the NPackage client will look for binaries on the NPackage server, in the same location as the package file itself. A lot of libraries are going to be distributed in .zip files, so I was planning on having the NPackage client download these .zip files and unpack them into the layout dictated by the package file.
I'm using NPackage to develop NPackage, which means I had to hack together the parsing and download code myself without any unit testing framework or parsing library. Now that I've done that, I've hand-crafted a package file for NUnit (
nunit.np) that'll let me start writing unit tests.
There are a few areas I'm not sure about:
- I've started writing it in C#, but I'm tempted to switch to F#, at least for the core functionality. I'm expecting to need some strong dependency graph logic (for dependencies between packages and between files within a package), which will be easier to code in F#. However I'd like to be able to be able to build the sources on Mono, and I'm not aware of a standard way of invoking the F# compiler from a Mono build process.
- I'm only dealing with binary distribution for now (and xcopy deployment at that). Building .NET libraries from source in a standard way could be tricky.
- I've picked a Yaml-based file format over XML because I expect these package files to be created by hand. As a result, it's going to be harder to generate or parse these files as part of an automated build system.
Here's the notes I made before I got started:
- Find the package
- Package files like in cabal
- A couple of standard locations: Hackage-like web server, internal source control repository
- Package identified by: name and version
- Deal with variants (like: 2.0 vs 3.5 vs 4.0; 32-bit vs 64-bit) by having separate packages released at the same time
- Install dependencies
- Package files declare their own dependencies, cabal style
- Recursively fetch and install dependencies
- Download the code
- Package file specifies location of source code (default is src/ directory relative to package file)
- Packages can release binaries only, e.g. NUnit, log4net etc. get downloaded from their normal locations
- Support fetching from source control as well as HTTP? - may make sense for internal deployments, what about mixing matching SCC systems?
- Build the code
- Skip this if the package just has binaries
- Reference the binaries
- Update VS solution and project files
I don't seem to be the first one to be having problems keeping track of his .NET binaries:
- OJ offered to build one :)
- John Mandia poined me towards several Java-based tools: Artifactory, Antill Pro and Nexus. Of these, Artifactory and Nexus are open source. Of course, Apache Maven is a fairly well-known Java packaging tool.
- James Webster showed me the Visual Studio Dependencies Manager, an addin for Visual Studio 2003.
Finally, Terry Spitz had some fairly enthusiastic feedback:
hell yes! we've got various vbscripts to do this. shouldn't it be 'easy' in say MSI (if too heavyweight), or powershell. additional points if it can handle multi-level caching, i.e. cross-region or internet code is cached on a team share as well as locally.
Windows Installer occurred to me when I started thinking about this. However, I think such a tool should be limited to deploying assemblies to a particular project's source tree -- deploying them via MSIs suggests putting them into a central location on each machine, and I predict that individual projects will start interfering with each other this way, particularly on a build server. On the other hand, Windows Installer does have the concept of merge modules: mini MSIs for software components that get merged into the final application installer.
Terry's multi-level caching idea is nice. There should definitely be local team and Internet repositories. Additionally, geographically distributed teams probably want local caches to keep overhead to a minimum. And I noticed that my Amazon-based web server cleverly goes to a special Ubuntu package repository hosted on S3, which keeps things quick and hopefully reduces my bandwidth costs.
Insipired by source code packaging systems like Haskell Cabal, I'd like a standard tool for publishing shared code. I've seen teams take a variety of sub-optimal approaches, including: building everything from source each time; building binaries once then referencing them from a network share; and building binaries then checking them into source control.
As a developer, what I'd like to be able to do is to declare the external libraries that my code depends on. I'd like it to work the same way for third-party code (say, NUnit and log4net) as for my own code (say, a library that gets re-used across several apps on the same team). There should be a standard format for publishing libraries, but there should be minimal effort involved in pulling one of these libraries into my code.
What I propose is:
- One or more central package sites. One could be a public web site that maintains a list of open-source libraries; teams could host their own for internal libraries. These internal sites could just be directories in the file system or in source control. Package sites just contain enough information to find the right binaries or source code -- the binaries and sources themselves don't have to live at the same site.
- A command line app that updates packages to the current version, either by downloading binaries or by downloading and building source code. This would run on individual developers' PCs and on a continuous integration server.
- A tool for keeping Visual Studio project references up to date. I spend far too much time in the Project References dialog -- or editing .csproj files by hand -- to fix broken reference paths and libraries accidentally referenced from a developer's GAC.
I don't know of anything that solves the problem as cleanly as in other languages. Am I missing something?
I'm a big fan of virtual servers and I've always run this web site from one. Until recently I had it on a VMware instance on my home PC, although my recent experience with Amazon EC2 and a couple of large traffic spikes prompted me to move it.
In the end the process turned out to be pretty easy:
- Back up to Amazon S3 using duplicity:
- MySQL dump:
- Start an EC2 small instance running AMI Ubuntu 9.04 (ami-ccf615a5)
- Restore from Amazon S3
apt-get -y install apache2 duplicity libapache2-mod-perl2 libdbd-mysql-perl libdbi-perl mysql-server perlmagick python-boto
- Restore MySQL dump, /etc/apache2 and /var/www using duplicity
- Run MySQL script against the local instance
- Start Apache. Check whether the static HTML pages and Movable Type's admin interface work.
- Assign an Amazon elastic IP address to the EC2 instance. This gives me a static IP address that I can refer to from DNS.
- Remap the DNS alias (an A record and a CNAME record) via my ISP's web site
I'm happy with the changes so far:
- Performance has been fine: although publishing the site now takes 30 seconds not 15, I'm getting much better response times and bandwidth
- I'm paying to run an EC2 instance full time whereas before I was just paying for home power bills
- I'm not going to get shot by my ISP next time one of my posts appears on Reddit
The fact that I was taking daily backups made the move risk-free. It took a couple of attempts to get a working site on the EC2 server, but I was able to start a fresh instance and restore from backup each time. I also know that, if the site does fall over in future, restoring from backup will take a few minutes and I'll lose one day of data at most.
Summer has almost arrived in London -- I took this at the weekend in one of the parks close to where I work.