Friday, November 4, 2011

Integration of JS Deflate Compression with other platforms

This post contain information how you could integrate JavaScript Deflate algorithms with other platforms like Java.

I will describe what need to be done in order to successfully compress string (even with non-ASCII) characters and decompress it using Java. So lets start.


JavaScript part:
In order to support international strings and be able to decompress them - on the first step we need to convert string into byte array. There are plenty of similar functions on web. Below is one of the examples:
function stringToByteArray(str) {
var b = [], i, unicode;
for(i = 0; i < str.length; i++) {
unicode = str.charCodeAt(i);
// 0x00000000 - 0x0000007f -> 0xxxxxxx
if (unicode <= 0x7f) {
b.push(String.fromCharCode(unicode));
// 0x00000080 - 0x000007ff -> 110xxxxx 10xxxxxx
} else if (unicode <= 0x7ff) {
b.push(String.fromCharCode((unicode >> 6) | 0xc0));
b.push(String.fromCharCode((unicode & 0x3F) | 0x80));
// 0x00000800 - 0x0000ffff -> 1110xxxx 10xxxxxx 10xxxxxx
} else if (unicode <= 0xffff) {
b.push(String.fromCharCode((unicode >> 12) | 0xe0));
b.push(String.fromCharCode(((unicode >> 6) & 0x3f) | 0x80));
b.push(String.fromCharCode((unicode & 0x3f) | 0x80));
// 0x00010000 - 0x001fffff -> 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
} else {
b.push(String.fromCharCode((unicode >> 18) | 0xf0));
b.push(String.fromCharCode(((unicode >> 12) & 0x3f) | 0x80));
b.push(String.fromCharCode(((unicode >> 6) & 0x3f) | 0x80));
b.push(String.fromCharCode((unicode & 0x3f) | 0x80));
}
}
return b;
}
view raw gistfile1.js hosted with ❤ by GitHub


In order to create correct Deflate bytesequence we also need some adjustments. Deflate libraries we reviewing are not compatible by default with ZLIB format available in lot of languages (including java.util.zip.Deflater). These JS libraries producing output in such called form as RawDeflate. Java deflater and any other ZLIB library could decompress this data but they require additional info: header and checksum. Info about ZLIB format

Header: During experiments I discovered that Java deflater/infalter accept/produce following bytes in header: 0x78 0xDA

Checksum: should be calculated using Adler32 algorithm. Adler32 code to create checksum is quite simple:
Note: before returning checksum function should convert it to the byte array in order to make it compatible with other data.
function adler32(data) {
var MOD_ADLER = 65521;
var a = 1, b = 0;
var index;
// Process each byte of the data in order
for (index = 0; index &lt; data.length; ++index) {
a = (a + data.charCodeAt(index)) % MOD_ADLER;
b = (b + a) % MOD_ADLER;
}
//adler checksum as integer;
var adler = a | (b &lt;&lt; 16);
//adler checksum as byte array
return String.fromCharCode(((adler &gt;&gt; 24) &amp; 0xff),
((adler &gt;&gt; 16) &amp; 0xff),
((adler &gt;&gt; 8) &amp; 0xff),
((adler &gt;&gt; 0) &amp; 0xff));
}
view raw gistfile1.js hosted with ❤ by GitHub


The final code which producing compressed data that could be easily decompressed on other platforms looks like:
Note: checksum should be calculated on the original byte array data before compression.
var originalString = "some data that should be compressed";
//convert string to bytes array
var originalBytes = stringToByteArray(originalString);
//generate header as byte array
var headerBytes = String.fromCharCode(120, 218);
//compress data
var compressedBytes = compress(originalBytes);
//calculate checksum
var checksumBytes = adler32(originalBytes);
//create final byte array
var resultBytes = headerBytes + compressedBytes + checksumBytes;
//convert it to base64.
var base64String = window.btoa(result);
view raw gistfile1.js hosted with ❤ by GitHub


After base64String is transferred to another platform it could be easily decompressed using zlib library.

Example of simple Java class which performs decompression:
package com.vmykhailyk.compression.deflate;
import org.apache.commons.codec.binary.Base64;
import java.io.ByteArrayOutputStream;
import java.util.zip.DataFormatException;
import java.util.zip.Inflater;
public class DeflateDecompressor {
private Inflater decompressor;
private ByteArrayOutputStream outputStream;
public DeflateDecompressor() {
decompressor = new Inflater();
outputStream = new ByteArrayOutputStream();
}
public String decompress(String data) {
try {
decompressor.reset();
outputStream.reset();
decompressor.setInput(Base64.decodeBase64(data));
byte[] buffer = new byte[1024];
while (!decompressor.finished()) {
int dataLength = decompressor.inflate(buffer);
outputStream.write(buffer, 0, dataLength);
}
return new String(outputStream.toByteArray(), "UTF-8");
} catch (DataFormatException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
}
view raw gistfile1.java hosted with ❤ by GitHub



Summary:
Deflate integration is not complicated, it works fine with international strings (if you will convert data to byte array) and there are plenty of code on all platforms to decompress the data. In the next post I will describe how to integrate lzw librariees with other platforms.

Previous Post: Compression performance test
Starting Post: Libraries and Test conditions
Next Post: Integration of LZW compression with other platforms.

8 comments:

  1. Hi Volodymyr,
    Thanks for your post. It's very helpful to me.
    I have a question: where is the "compress()" method from in "var compressedBytes = compress(originalBytes);"? Are you using dankogai/js-deflate or any other js libs?

    ReplyDelete
  2. Hi Ethan,

    compress() funciton is abstraction of the call to the actual
    compression library.

    In case of dankogai-js-deflate it will be :
    RawDeflate.deflate(data, level);

    In case of onicios-deflate:
    zip_deflate(data, level);

    ReplyDelete
  3. Hi,

    do you have a working example of this? I implemented this end to end and the Java side is throwing exceptions.

    Thanks!

    ReplyDelete
    Replies
    1. there are lot of code I used for testing and benchmarking here:
      https://github.com/volodymyr-mykhailyk/JSCompressionPerformance

      it's not very clean and organized but there should be full sources if you poke around

      Delete
  4. I have this working, but if I change the text, the decompressor does not work any more. How did you decide what to set the header with?

    ReplyDelete
    Replies
    1. Header describe algorithm, checksum and compression level. More info on header can be found here http://www.ietf.org/rfc/rfc1950.txt

      Delete
  5. Hi,

    var headerBytes = String.fromCharCode(120, 218);

    Can you please explain me how this gives the header bytes. Means, will the parameters 120 and 218 change ??

    ReplyDelete
    Replies
    1. No they will not change. It always gives you header bytes for Java to decompress the stream.

      Delete