Hi folks. I want to share with you some details about our engine. As you know, it is written in Go. We use a lot of libraries there, and one of them – mxj
– an outstanding library to work with XML
.
Now I am going to briefly tell you how our engine’s json2xml
routine works. First, we convert json
to the map [string] interface {}
, and then feed this object to mxj following way: xmlValue, err: = mxj.AnyXmlIndent (data, "", "", "body")
. After it, we fix the self-closed
tags and pass the object. We used this logic for 3 months, and everything was just fine, but suddenly it comes that we need to parse larger volumes of json
than usual. So it turned out to be a problem. One of the diggers works 8 hours instead of 15 minutes. So we did the necessary research. Page processing takes 16 minutes, which, for obvious reasons, is unacceptable. It turned out that there is 2.5 MB of json. Processing takes about 3 minutes using mxj
library, and then some magic happened – the engine went crazy, and it took 13 minutes to process XML
. Of course, we were not happy with it, and we decided to improve mxj
first.
mxj
library problem lay in the fact that it uses a string concatenation. Everyone knows that the strings in Golang are immutable, respectively, each such operation allocates memory for the old string and a new string. We decided to get around and have written a few new functions, which uses bytes.Buffer
instead of strings. Only by this simple change, we were able to speed up XML
processing in mxj
library by about 180 times. Now it takes less than 1 second to process the same set of data we used before, so we made it from 3 min to 1 sec.
During further research we found were we made a mistake, our engine expects HTML
and when we are working with JSON
, it may come up that some self-closed HTML
tags (like img
or area
etc.) are used in XML
as standard tags and it caused problems, so we made another change to the library that allowed us to replace some tags with safe versions. It solved all the issues we had, and the page that previously took 15 min to process now takes just 6 sec.
Repository with the library we modified can be found here.
As a bonus, we wrote a simple converter that allows you to load data from MongoDB
and convert it toXML
. You can get it here.