Friday, December 2, 2011

Integration of LZW Compression with other platforms

This post contain information how you could integrate JavaScript LZW algorithms with other platforms like Java.

I will describe what need to be done in order to successfully compress string (including international characters) and decompress it using Java.


All LZW algorithms I located:
  • Could work only with characters which code-point doesn't exceed 256 (non international characters). So you need to convert original string into byte representation as in previous library.
  • Produce array of integers as output after compression. So you need to think of a way how this array could be transferred over the XMLHTTPRequest.

Regarding the second item there are lot of options:
  • JSON data
  • Bencode
  • Serialization to string with separators
In current example for simplicity I am using data as serialized string. For example:
//array of integers
var t = [123, 5468, 215,4,6543]
//will be serialized using:
//t.join(",")
//to following string:
"123,5468,215,4,6543"
view raw gistfile1.js hosted with ❤ by GitHub

JavaScript code
In order to compress the data on JS side you need to perform 3 steps:

1. Convert string to the byte array. Note: byte array should also be represented as solit string where every character is single byte (code: array.join("")). LZW libraries could accept only strings as input.

2. Compress the data using any LZW library.

3. Serialize result
For this test I converting array to the text representation.
Note: In real world example this is not the most efficient way to transmit integer arrays.
var originalString = "some data that should be compressed";
//convert string to bytes array
var originalBytes = stringToByteArray(originalString).join("");
//compress data
var compressedData = compress(originalBytes);
//serialize the data
var serializedData = compressedData.join(",");
view raw gistfile1.js hosted with ❤ by GitHub


Java code:
Extraction of code which accepts input as String presented above ("123,5468,215,4,6543"). And decompress it to the UTF-8 string. Use decompress method as entry point
package com.vmykhailyk.compression.lzw;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class LZWModuleDecompression {
/**
* Decompress a list of output ks to a string.
*/
private String decompressData(List<Integer> compressed) {
// Build the dictionary.
int dictSize = 256;
Map<Integer, String> dictionary = new HashMap<Integer, String>();
for (int i = 0; i < 256; i++)
dictionary.put(i, "" + (char) i);
String w = "" + (char) (int) compressed.remove(0);
String result = w;
for (int k : compressed) {
String entry;
if (dictionary.containsKey(k))
entry = dictionary.get(k);
else if (k == dictSize)
entry = w + w.charAt(0);
else
throw new IllegalArgumentException("Bad compressed k: " + k);
result += entry;
// Add w+entry[0] to the dictionary.
dictionary.put(dictSize++, w + entry.charAt(0));
w = entry;
}
return result;
}
private byte[] getCharsAsBytes(String decompressed) {
int length = decompressed.length();
ByteBuffer buffer = ByteBuffer.allocate(length);
for (int i = 0; i < length; i++) {
buffer.put((byte) decompressed.codePointAt(i));
}
return buffer.array();
}
public String decompress(String data) {
try {
String[] intsAsString = data.split(",");
ArrayList<Integer> integers = new ArrayList<Integer>();
for (String anIntsAsString : intsAsString) {
integers.add(Integer.parseInt(anIntsAsString));
}
String decompressed = decompressData(integers);
return new String(getCharsAsBytes(decompressed), "UTF-8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return "";
}
}
view raw gistfile1.java hosted with ❤ by GitHub

Note: This code doesn't contain any input validation or error handling code but it should give basic idea how to decompress data on the other side of html. decompressData method is taken from here.


Summary:
Code for LZW alghhorithms are very small, fast and simple. They allow you quickly compress required data and decompress without any problems. However before usage you need to decide how integers arrays will be transmitted between compressor/decompressor. Also compression ratio is not as good as Deflate


Previous Post: Integration of JS Deflate Compression with other platforms
Starting Post: Libraries and Test conditions
Next Post: TBD: Integration of LZMA compression with other platforms.

No comments:

Post a Comment