Berkeley Bits

June 3, 2009

Gripes with Distributed OSGi

Filed under: What I'm Up To — jholtzman @ 5:17 pm

For the most part, I’ve been pretty happy with the distributed OSGi programming model.  Unfortunately, there are a couple of areas where the reference implementation is getting under my skin:

  1. WSDLs are not easily usable outside DOSGI
  2. There is no obvious way to handle cross-cutting concerns

It may be possible to fix my WSDL gripe by modifying the CXF DOSGi implementation to use JAXB and JAX-WS annotations.  This would make the WSDLs more usable, but there was quite a bit of disagreement on the CXF mailing list about the wisdom of annotating the service interfaces.  Still, I might spend some time investigating whether this is a workable solution.

To handle  cross-cutting concerns, I’m about to take a look at Equinox Aspects, but I’m hesitant to go down that path for two reasons.  First, it seems to be designed specifically for the equinox container.  And second, it is yet another pre-1.0 project, and I’m hesitant to keep building upon such bleeding-edge technologies.

May 31, 2009

Matterhorn’s value proposition

Filed under: What I'm Up To — jholtzman @ 8:16 am

I’ve been thinking a bit about the media capture, processing, and distribution software ecosystem.  Most of the available systems are proprietary, and the open source options are typically licensed under the GPL.  Matterhorn’s major value proposition at this point (that would be the vapor-ware point) consists of the community and the license, both being Apache-style.

But we can’t stop there.  The software itself needs to add value, and I think flexibility is where we should focus.  We know that we can not make any assumptions about the deployment topologies of Matterhorn capture devices or application servers.  Depending on their needs, institutions will choose the number of capture devices per venue, the number of venues, the number of encoding nodes, and the number of media analysis nodes needed to handle the quantity and types of media captured and processed.  It seems that the Matterhorn partners are in agreement about the need for this type of flexibility.

What we haven’t talked much about yet is flexibility of storage.  Storage presents a fundamental budget and IT challenge for any institution intending to produce and manage many terabytes of content per semester.  Institutions should be able to utilize a SAN or a content repository if they choose, but neither of these should be a requirement.  Considering the ever expanding cloud computing options, a distributed file system such as Hadoop’s HDFS might be an attractive choice as well.

The implications of this kind of infrastructure flexibility will necessarily impact the software design.  Direct access to java.io.File will simply not be an option.  It remains to be seen whether the performance penalty of streaming multi-gigabyte files between capture clients, repositories, and application servers will be acceptable.

January 30, 2009

Cooper interview

Filed under: What I'm Up To — jholtzman @ 4:31 pm

I’ve neither read Cooper’s books nor seen/heard him speak before today, when I found a this interview recorded at “Agile 2008″.  It’s a discussion of agile programming, why it came about, and what it’s popularity says about the converging motivations of programmers and interaction designers.  It’s long (~45 min), but I found it worthwhile.
(more…)

RESTEasy in Sakai 2.5.x

Filed under: What I'm Up To — jholtzman @ 11:29 am

I did a bit of fiddling with RESTEasy yesterday.  Following the examples in the jboss source, I was able to resolve a URL via RESTEasy after about 30 min of pom and web.xml wrangling.  Not too bad.

RESTEasy pulled the following jars into my testing webapp (I chose the gradebook, since I’m familiar with its structure):

(more…)

January 15, 2009

A quick review of some open source workflow engines

Filed under: What I'm Up To — jholtzman @ 5:16 pm

Matterhorn must provide a mechanism to allow institutions to easily configure workflows that match their local business processes for capturing, processing, and distributing media.  I’ve been looking at some open source workflow engines, hoping that one of them will meet our needs.  This is a cursory review of the engines I’ve found to date:

(more…)

December 19, 2008

Stress-testing Sakai

Filed under: What I'm Up To — jholtzman @ 4:43 pm

I’ve been using JWebUnit to put load on our sakai instance in order to work through some performance problems we’re seeing in production, but can’t seem to reproduce on dev or qa.  JWebUnit has been a pleasure to work with, creating 100+ concurrent users is a snap.  With code like:

(more…)

November 20, 2008

Matterhorn POC: OSGI

Filed under: What I'm Up To — Tags: , , — jholtzman @ 11:58 am

With the deadline for a grant proposal fast approaching, I’ve been placed in charge of writing about the development techniques and technologies we’ll be using in the project.  Apparently, we don’t have to include promises to use specific products, but we should reference the options we’re considering and what standards we’re planning on using.  In order to write a somewhat well thought through proposal, I’ve been researching:

1) Containers: JEE App Servers, OSGI frameworks (Felix and Equinox), and Spring DM
2) Web services frameworks: Axis2, CXF, and JAX-RS implementations (CXF and Jersey)
3) Messaging and Service Buses (Servicemix, Mule, and ActiveMQ)

From my brief experiments with these products, I’ve come up with a pretty trim proof-of-concept that might serve our needs in Matterhorn.  It works like this:

A main class boots an instance of Apache Felix, which is configured to auto-start two OSGI bundles (there are others, but only two are of particular interest): CXF and a Jackrabbit JCR server.  Other OSGI bundles can be added, removed, or refreshed at anytime via Felix’s web interface (I don’t have it up and running yet in this environment, though I have tested the management webapp that comes with Felix, and it’s pretty easy to use).

I’ve added a sample component that uses the JCR server to get or set content (currently a string… it’s just a proof-of-concept afterall :) ) at a particular path in the repository, and this functionality is exposed as a webservice. I was pleasantly surprised that it took only a few minutes to publish a web service.  The implementation of the component is under 100 lines of code, too.

What this setup buys us is a very lightweight way to manage components in a running system, so we should be able to hot-swap services as we make changes during development without stopping/restarting the entire system.

In addition to the cost of migrating code from existing projects (such as REPLAY) into OSGI bundles, the strict classloading rules of the OSGI environment take some time to learn, and will most likely slow developers down (not that everyone will need to work in this area of the system).  Still, with the number of projects moving toward OSGI, it seems like this is becoming the de facto Java component system, and it will behoove us all to become adept in this environment.  That, and we don’t have to get into the business of providing our own container, or trying to use one that doesn’t meet our needs for component isolation.

October 22, 2008

JCR Cluster Experiments

Filed under: What I'm Up To — Tags: , , , , — jholtzman @ 12:59 pm

I’m starting to work on my tasklist for Matterhorn (Opencast v1.0).  One of the research tasks I have is answering how a number components spread across different JVMs (or not running in a JVM at all) and different servers can efficiently access a common content repository.  There may not be an efficient mechanism for non JVM components, but there must be some kind of network access.

I’ve put together some experiments to help me understand JCR in general and Jackrabbit in particular, with the hope that we’ll be able to leverage this JCR implementation in Matterhorn.  So far, I’ve had luck reading and writing to JCR from multiple JVMs.  The writes from JVM1 show up in JVM2, so this looks like a good start.  The code is at https://opencastproject.svn.sourceforge.net/svnroot/opencastproject/jcr/

September 25, 2008

REPLAY –> Opencast update

Filed under: What I'm Up To — Tags: , — jholtzman @ 4:33 pm

Quick progress update…

I’ve managed to get the REPLAY manager webapp to run individual jobs, return the ID of the job, and then return the status of the job.  This proof-of-concept (based on replay v0.4) just executes an indexing job on a pre-defined media bundle.  More work will obviously be needed to make this fully functional.

This is no big deal, really, but it does mean that REPLAY can fulfill the “media processing manager” component, as described in the opencast draft architecture document.

September 14, 2008

Replay requires Perian

Filed under: Hard-Won Wisdom, What I'm Up To — jholtzman @ 9:05 am

I spent a lot of time last week trying to get replay to encode, OCR, and index some video files that were captured and encoded for webcast.berkeley.edu.  It was productive to go through the source code, looking for the reason I couldn’t get anything to index.  After a couple of hours help from Tobias, we tracked down the problem: there is an undocumented dependency on the Perian toolkit on OS X.  Now that perian is installed, mp4s, dv files, and the like are properly encoded, OCR’d, and indexed.

Older Posts »

Powered by WordPress