I will describe what need to be done in order to successfully compress string (even with non-ASCII) characters and decompress it using Java. So lets start.
JavaScript part:
In order to support international strings and be able to decompress them - on the first step we need to convert string into byte array. There are plenty of similar functions on web. Below is one of the examples:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function stringToByteArray(str) { | |
var b = [], i, unicode; | |
for(i = 0; i < str.length; i++) { | |
unicode = str.charCodeAt(i); | |
// 0x00000000 - 0x0000007f -> 0xxxxxxx | |
if (unicode <= 0x7f) { | |
b.push(String.fromCharCode(unicode)); | |
// 0x00000080 - 0x000007ff -> 110xxxxx 10xxxxxx | |
} else if (unicode <= 0x7ff) { | |
b.push(String.fromCharCode((unicode >> 6) | 0xc0)); | |
b.push(String.fromCharCode((unicode & 0x3F) | 0x80)); | |
// 0x00000800 - 0x0000ffff -> 1110xxxx 10xxxxxx 10xxxxxx | |
} else if (unicode <= 0xffff) { | |
b.push(String.fromCharCode((unicode >> 12) | 0xe0)); | |
b.push(String.fromCharCode(((unicode >> 6) & 0x3f) | 0x80)); | |
b.push(String.fromCharCode((unicode & 0x3f) | 0x80)); | |
// 0x00010000 - 0x001fffff -> 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx | |
} else { | |
b.push(String.fromCharCode((unicode >> 18) | 0xf0)); | |
b.push(String.fromCharCode(((unicode >> 12) & 0x3f) | 0x80)); | |
b.push(String.fromCharCode(((unicode >> 6) & 0x3f) | 0x80)); | |
b.push(String.fromCharCode((unicode & 0x3f) | 0x80)); | |
} | |
} | |
return b; | |
} |
In order to create correct Deflate bytesequence we also need some adjustments. Deflate libraries we reviewing are not compatible by default with ZLIB format available in lot of languages (including java.util.zip.Deflater). These JS libraries producing output in such called form as RawDeflate. Java deflater and any other ZLIB library could decompress this data but they require additional info: header and checksum. Info about ZLIB format
Header: During experiments I discovered that Java deflater/infalter accept/produce following bytes in header: 0x78 0xDA
Checksum: should be calculated using Adler32 algorithm. Adler32 code to create checksum is quite simple:
Note: before returning checksum function should convert it to the byte array in order to make it compatible with other data.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function adler32(data) { | |
var MOD_ADLER = 65521; | |
var a = 1, b = 0; | |
var index; | |
// Process each byte of the data in order | |
for (index = 0; index < data.length; ++index) { | |
a = (a + data.charCodeAt(index)) % MOD_ADLER; | |
b = (b + a) % MOD_ADLER; | |
} | |
//adler checksum as integer; | |
var adler = a | (b << 16); | |
//adler checksum as byte array | |
return String.fromCharCode(((adler >> 24) & 0xff), | |
((adler >> 16) & 0xff), | |
((adler >> 8) & 0xff), | |
((adler >> 0) & 0xff)); | |
} |
The final code which producing compressed data that could be easily decompressed on other platforms looks like:
Note: checksum should be calculated on the original byte array data before compression.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var originalString = "some data that should be compressed"; | |
//convert string to bytes array | |
var originalBytes = stringToByteArray(originalString); | |
//generate header as byte array | |
var headerBytes = String.fromCharCode(120, 218); | |
//compress data | |
var compressedBytes = compress(originalBytes); | |
//calculate checksum | |
var checksumBytes = adler32(originalBytes); | |
//create final byte array | |
var resultBytes = headerBytes + compressedBytes + checksumBytes; | |
//convert it to base64. | |
var base64String = window.btoa(result); |
After base64String is transferred to another platform it could be easily decompressed using zlib library.
Example of simple Java class which performs decompression:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.vmykhailyk.compression.deflate; | |
import org.apache.commons.codec.binary.Base64; | |
import java.io.ByteArrayOutputStream; | |
import java.util.zip.DataFormatException; | |
import java.util.zip.Inflater; | |
public class DeflateDecompressor { | |
private Inflater decompressor; | |
private ByteArrayOutputStream outputStream; | |
public DeflateDecompressor() { | |
decompressor = new Inflater(); | |
outputStream = new ByteArrayOutputStream(); | |
} | |
public String decompress(String data) { | |
try { | |
decompressor.reset(); | |
outputStream.reset(); | |
decompressor.setInput(Base64.decodeBase64(data)); | |
byte[] buffer = new byte[1024]; | |
while (!decompressor.finished()) { | |
int dataLength = decompressor.inflate(buffer); | |
outputStream.write(buffer, 0, dataLength); | |
} | |
return new String(outputStream.toByteArray(), "UTF-8"); | |
} catch (DataFormatException e) { | |
e.printStackTrace(); | |
} catch (Exception e) { | |
e.printStackTrace(); | |
} | |
return null; | |
} | |
} |
Summary:
Deflate integration is not complicated, it works fine with international strings (if you will convert data to byte array) and there are plenty of code on all platforms to decompress the data. In the next post I will describe how to integrate lzw librariees with other platforms.
Previous Post: Compression performance test
Starting Post: Libraries and Test conditions
Next Post: Integration of LZW compression with other platforms.
Hi Volodymyr,
ReplyDeleteThanks for your post. It's very helpful to me.
I have a question: where is the "compress()" method from in "var compressedBytes = compress(originalBytes);"? Are you using dankogai/js-deflate or any other js libs?
Hi Ethan,
ReplyDeletecompress() funciton is abstraction of the call to the actual
compression library.
In case of dankogai-js-deflate it will be :
RawDeflate.deflate(data, level);
In case of onicios-deflate:
zip_deflate(data, level);
Hi,
ReplyDeletedo you have a working example of this? I implemented this end to end and the Java side is throwing exceptions.
Thanks!
there are lot of code I used for testing and benchmarking here:
Deletehttps://github.com/volodymyr-mykhailyk/JSCompressionPerformance
it's not very clean and organized but there should be full sources if you poke around
I have this working, but if I change the text, the decompressor does not work any more. How did you decide what to set the header with?
ReplyDeleteHeader describe algorithm, checksum and compression level. More info on header can be found here http://www.ietf.org/rfc/rfc1950.txt
DeleteHi,
ReplyDeletevar headerBytes = String.fromCharCode(120, 218);
Can you please explain me how this gives the header bytes. Means, will the parameters 120 and 218 change ??
No they will not change. It always gives you header bytes for Java to decompress the stream.
Delete