News

Soundbyte 44: Thunderstruck

20 mei 2012

A few weeks back I finally had my Luminis introduction evening. A bit late for an introduction (started September last year), but I ended up having some really nice conversations with Hans and Jan Willem over dinner so it was an evening well spent. Of course we discussed music and started talking about AC/DC, my favorite band of all time. They started in 1973 and are still playing. What I really like is that they didn’t change much over the years but kept doing what they are great at instead. I was planning on adding a video of one of my favorite songs “Shoot to thrill”, but since I’m writing this on our balcony while the thunder is rolling in “Thunderstruck” seems much more appropriate.

Polyglot data storage

The past two weeks I have been very busy migrating most of the Leren-op-Maat data storage components to MongoDB. When we got started with Leren-op-Maat we knew we would have to integrate with external semantic stores so we decided to store most of our data in it. Bad choice. Let me explain by first giving some background. In a semantic store you save data in the form of triples. For example:

http://myscheme/paulb hasName “paul”

http://myscheme/paulb worksFor “Luminis”

If you have an object with 5 properties, you will end up with (at least) 5 triples. The other thing to know is that a semantic store is “schemaless”. This is great because you can more easily adapt your models, but it can also be a problem. Let me explain…

In Java we obviously work with objects, and at some point we want to persist those. Doing so in a semantic store means creating triples for each single property of that class. This is not so different with mapping them to columns in a relational database though. When querying however, you have to add a statement to your SPARQL query for each triple that you want to include in your result. There is no “select * from Person”. Now we have a choice, we either only query the properties we really need for each specific situation, but this ripples through the rest of our code base and will hurt our object model and interfaces so that’s not an option. Mapping triples back to Java objects will result in very long queries and a lot of mapping code. The worst part is that the queries are not enforced by a schema. Making a typo in a query will often result in a working query that never returns any data. And then the debugging fun starts… Did I make an error in inserting my triples, or did I make a typo in my query? In queries that involve a lot of triples this can be a real pain.

Of course this can be fixed by using an Object Mapping framework, with the only problem that there aren’t any decent ones for semantic technology. Object mapping is very complex (take a look at the JPA spec.), and you don’t write a framework like that on a rainy sunday afternoon either (JPA took many years), so your out of luck there.

Ok, so a semantic store isn’t the right solution for internal data storage in a Java application. Of course a relational database is an alternative. With the NoSQL trend going strong this might look like something you really shouldn’t be doing, but I think that idea is incorrect. Relational databases work very well for applications that don’t have to deal with massive scale, which happens to be most of the applications we work on. Although the ORM problem is very complex, the APIs and frameworks are very evolved. Martin Fowler recently blogged about the subject of OrmHate. In Leren-op-Maat we do use MySQL with JPA for some components, but there is even a better match.

In Leren-op-Maat the user interface is completely built with HTML5 and JavaScript, we don’t use any server side web framework. On the server side we only have RESTful web services that produce and accept JSON data. Because almost every object is converted to JSON at some point our domain model is designed for this use. We use a lot of embedding of objects, instead of relations between objects because this fits the JSON format very well. Hello MongoDB. Mongo stores data internally as BSON documents. BSON is binary JSON. Because our objects are already modeled to be mapped to JSON, they can be stored the same way to Mongo, without complex mappings that a relational database would require. When querying you receive those same BSON documents back which can automatically be converted to our domain objects with little to none mapping code. After replacing all semantic storage components with Mongo implementations we ended up with significantly less and cleaner code. Awesome.

The lesson learnt here is that you need to pick the right tool for the right job. For me, this is what NoSQL is about. There are data stores that do massive scaling, document stores like Mongo, and relational databases still have an important place too in my opinion. Mix and match what fits your needs and architecture.

But wait… Didn’t we choose semantic technology in Leren-op-Maat for a reason? Integration with other semantic stores that is? Yes we did, and this requirement hasn’t changed. What we do know now is that it will never work to just share all our data, there is way too much Leren-op-Maat specific data that is really not interesting for anybody else. The solution is that we will extract parts of our data and create triples for them. This is data following strict ontologies, meant to be shared. This does mean we store some data at two places, but not building the rest of our code on this is well worth that tradeoff. So don’t get me wrong, I’m not saying semantic technology is not usable. It very much is, just not for the internal storage of an application.

Conferences

Bert Ertman and I spoke at JavaOne Russia last month where we (again) did our Migrating Spring to Java EE talk. We also published the first three parts of an article series about this subject on jboss.org. While being in Moscow we were interviewed for the Java Spotlight podcast about the subject too which will be published within a few weeks from now. The results of the call-for-papers for JavaOne San Fransisco should come in soon, and with a grand total of 8 proposals (by Bert, Marcel, Marc Teutelink and me) this is kind of exiting. The next events are a BEJUG evening session about “Modular Java EE in the cloud” by Bert and me at June 12, and three talks at JUDCon and JBoss World end of June in Boston by me.

BndTools

Leren-op-Maat is built using BndTools. A lot has changed the past year; from a highly unstable, buggy tool to something that I wouldn’t want to miss anymore doing OSGi development. If you are doing OSGi, this is the way to go, the development speed compared to OSGi with Maven is enormous. A few weeks back I prototyped ACE integration for BndTools that I demo in this video, where you can automatically deploy to ACE during development. Marcel and I will join a BndTools Hackington next month to work on BndTools with a small group of people for a few days, where this is one of the things on the agenda.

OSGi testing

The testing frameworks available in OSGi are all not great and don’t support testing using the same deployment strategy we use in production (Apache ACE). Together with Marcel, I supervised Luminis Technologies intern Tran for the past few months to work on this for his graduation assignment. Last week he delivered the first working release and this is very exiting news. We didn’t exactly give him an easy assignment (e.g. multi node tests) and kept on refining requirements because this is a tool we actually want to use. The framework is based on Arquillian and of course uses Apache ACE. We still have a few weeks to wrap things up and we will write a lot more about it then. Of course it’s all going to be open sourced.

Paul Bakker
A few weeks back I finally had my Luminis introduction evening. A bit late for an introduction (started September last year), but I ended up having some really nice conversations with Hans and Jan Willem over dinner so it was an evening well spent. Of course we discussed music and started talking about AC/DC, my favorite band of all time. They started in 1973 and are still playing. What I really like is that they didn’t change much over the years but kept doing what they are great at instead. I was planning on adding a video of one of my favorite songs “Shoot to thrill”, but since I’m writing this on our balcony while the thunder is rolling in “Thunderstruck” seems much more appropriate.

Polyglot data storage

The past two weeks I have been very busy migrating most of the Leren-op-Maat data storage components to MongoDB. When we got started with Leren-op-Maat we knew we would have to integrate with external semantic stores so we decided to store most of our data in it. Bad choice. Let me explain by first giving some background. In a semantic store you save data in the form of triples. For example:

http://myscheme/paulb hasName “paul”

http://myscheme/paulb worksFor “Luminis”

If you have an object with 5 properties, you will end up with (at least) 5 triples. The other thing to know is that a semantic store is “schemaless”. This is great because you can more easily adapt your models, but it can also be a problem. Let me explain…

In Java we obviously work with objects, and at some point we want to persist those. Doing so in a semantic store means creating triples for each single property of that class. This is not so different with mapping them to columns in a relational database though. When querying however, you have to add a statement to your SPARQL query for each triple that you want to include in your result. There is no “select * from Person”. Now we have a choice, we either only query the properties we really need for each specific situation, but this ripples through the rest of our code base and will hurt our object model and interfaces so that’s not an option. Mapping triples back to Java objects will result in very long queries and a lot of mapping code. The worst part is that the queries are not enforced by a schema. Making a typo in a query will often result in a working query that never returns any data. And then the debugging fun starts… Did I make an error in inserting my triples, or did I make a typo in my query? In queries that involve a lot of triples this can be a real pain.

Of course this can be fixed by using an Object Mapping framework, with the only problem that there aren’t any decent ones for semantic technology. Object mapping is very complex (take a look at the JPA spec.), and you don’t write a framework like that on a rainy sunday afternoon either (JPA took many years), so your out of luck there.

Ok, so a semantic store isn’t the right solution for internal data storage in a Java application. Of course a relational database is an alternative. With the NoSQL trend going strong this might look like something you really shouldn’t be doing, but I think that idea is incorrect. Relational databases work very well for applications that don’t have to deal with massive scale, which happens to be most of the applications we work on. Although the ORM problem is very complex, the APIs and frameworks are very evolved. Martin Fowler recently blogged about the subject of OrmHate. In Leren-op-Maat we do use MySQL with JPA for some components, but there is even a better match.

In Leren-op-Maat the user interface is completely built with HTML5 and JavaScript, we don’t use any server side web framework. On the server side we only have RESTful web services that produce and accept JSON data. Because almost every object is converted to JSON at some point our domain model is designed for this use. We use a lot of embedding of objects, instead of relations between objects because this fits the JSON format very well. Hello MongoDB. Mongo stores data internally as BSON documents. BSON is binary JSON. Because our objects are already modeled to be mapped to JSON, they can be stored the same way to Mongo, without complex mappings that a relational database would require. When querying you receive those same BSON documents back which can automatically be converted to our domain objects with little to none mapping code. After replacing all semantic storage components with Mongo implementations we ended up with significantly less and cleaner code. Awesome.

The lesson learnt here is that you need to pick the right tool for the right job. For me, this is what NoSQL is about. There are data stores that do massive scaling, document stores like Mongo, and relational databases still have an important place too in my opinion. Mix and match what fits your needs and architecture.

But wait… Didn’t we choose semantic technology in Leren-op-Maat for a reason? Integration with other semantic stores that is? Yes we did, and this requirement hasn’t changed. What we do know now is that it will never work to just share all our data, there is way too much Leren-op-Maat specific data that is really not interesting for anybody else. The solution is that we will extract parts of our data and create triples for them. This is data following strict ontologies, meant to be shared. This does mean we store some data at two places, but not building the rest of our code on this is well worth that tradeoff. So don’t get me wrong, I’m not saying semantic technology is not usable. It very much is, just not for the internal storage of an application.
Onze verontschuldigingen, dit bericht is alleen beschikbaar in English.

Conferences

Bert Ertman and I spoke at JavaOne Russia last month where we (again) did our Migrating Spring to Java EE talk. We also published the first three parts of an article series about this subject on jboss.org. While being in Moscow we were interviewed for the Java Spotlight podcast about the subject too which will be published within a few weeks from now. The results of the call-for-papers for JavaOne San Fransisco should come in soon, and with a grand total of 8 proposals (by Bert, Marcel, Marc Teutelink and me) this is kind of exiting. The next events are a BEJUG evening session about “Modular Java EE in the cloud” by Bert and me at June 12, and three talks at JUDCon and JBoss World end of June in Boston by me.

BndTools

Leren-op-Maat is built using BndTools. A lot has changed the past year; from a highly unstable, buggy tool to something that I wouldn’t want to miss anymore doing OSGi development. If you are doing OSGi, this is the way to go, the development speed compared to OSGi with Maven is enormous. A few weeks back I prototyped ACE integration for BndTools that I demo in this video, where you can automatically deploy to ACE during development. Marcel and I will join a BndTools Hackington next month to work on BndTools with a small group of people for a few days, where this is one of the things on the agenda.

OSGi testing

The testing frameworks available in OSGi are all not great and don’t support testing using the same deployment strategy we use in production (Apache ACE). Together with Marcel, I supervised Luminis Technologies intern Tran for the past few months to work on this for his graduation assignment. Last week he delivered the first working release and this is very exiting news. We didn’t exactly give him an easy assignment (e.g. multi node tests) and kept on refining requirements because this is a tool we actually want to use. The framework is based on Arquillian and of course uses Apache ACE. We still have a few weeks to wrap things up and we will write a lot more about it then. Of course it’s all going to be open sourced.

Paul Bakker

Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *