Big Data in Academia

Just had an interesting couple of days. Went to give a seminar to my old University, Aberystwyth, on Monday. Talked about Pre-Sales and Big Data. Had a look around the Department of Computer Science too. Lots of interesting things going on, and the University is going from strength to strength.

I was particularly interested in their involvement in the ExoMars project. They’re currently building a 1/2 size rover and testing it in their planetary surface lab. This has material with the same particle size as the surface of Mars. This enables them to test the rovers as accurately as possible. They’ve been testing them this year on Tenerife on an area very much like Mars. They’ve even connected the rover to an Aerobot flying above it. This has a high resolution camera on it which photographs the area around the rover. Given the movement of the Aerobot, they’ve needed some special software to stitch the photos together, and make a 3D model of the ground’s surface. They did this in conjunction with a Czech University who has developed this amazing software. The rover itself they are building has 9 microcontrollers to control the 6 wheels which are all fully articulated.

The research in the Department is very interesting too. The Bioinformatics group in particular have lots of projects at the moment. I was also talking yesterday at an SGI event to various academic and research groups. Lots of very interesting projects, but which are hitting a painful wall. The better their sensors get and the more information they collect the sooner they hit a big data challenge, both in storage and in analysis. It was very interesting to hear from SGI’s CTO that a lot of projects are collecting data but only storing 1% or even 0.1% of it. They’re triaging data as it comes in and only storing what is likely to be useful.

It’s also an interesting idea of SGIs to build a single UV SuperNode rather than use a cluster. So you don’t need special programming languages, you can just run your normal analysis programmes on a single node with TB of RAM. Something they also saw as an important trend was power use. When you store such large data there is a massive cost associated with just spinning discs around, many of which may not be used at any given time. Yes you could use SSDs, but they’re still very expensive. SGI have developed a zero-watt drive that is used to store information that hasn’t been accessed in a while (you can configure your own policies), and spins down. When data is requested that lives on these drives, they are spun up and copied in to the set of ‘normal’ hard discs for immediate access. You can even pre-fetch data prior to running analysis. Oh and fetching this large data takes under 15 seconds. Much better than the 2 minutes of old. An interesting idea for tiered storage that’s well worth a look.

I’m now putting together the finishing touches to my list of Academic and Research Big Data challenges and solutions. If you’re facing a big data challenge please send me an email and let me know so I can share my research – whether MarkLogic is the best solution or not, I’ll be sure to tell you. adam dot fowler at marklogic dot com.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.