I’ve been thinking a bit about the media capture, processing, and distribution software ecosystem. Most of the available systems are proprietary, and the open source options are typically licensed under the GPL. Matterhorn’s major value proposition at this point (that would be the vapor-ware point) consists of the community and the license, both being Apache-style.
But we can’t stop there. The software itself needs to add value, and I think flexibility is where we should focus. We know that we can not make any assumptions about the deployment topologies of Matterhorn capture devices or application servers. Depending on their needs, institutions will choose the number of capture devices per venue, the number of venues, the number of encoding nodes, and the number of media analysis nodes needed to handle the quantity and types of media captured and processed. It seems that the Matterhorn partners are in agreement about the need for this type of flexibility.
What we haven’t talked much about yet is flexibility of storage. Storage presents a fundamental budget and IT challenge for any institution intending to produce and manage many terabytes of content per semester. Institutions should be able to utilize a SAN or a content repository if they choose, but neither of these should be a requirement. Considering the ever expanding cloud computing options, a distributed file system such as Hadoop’s HDFS might be an attractive choice as well.
The implications of this kind of infrastructure flexibility will necessarily impact the software design. Direct access to java.io.File will simply not be an option. It remains to be seen whether the performance penalty of streaming multi-gigabyte files between capture clients, repositories, and application servers will be acceptable.