Tags

,

Problem
In my project, we run some queries in Solr server, and return combined response back to client. But some text fields are too large, we would like to reduce their size, 

Use ByteArrayOutputStream and Base64 Encoder to Compress String
At server side, we can use ZipOutputStream to compress a string to a ByteArrayOutputStream. But we can’t transfer the byte array as text in http response. We have to use a Base64 encoder to encode the byte array as Base64. We can use org.apache.commons.codec.binary.Base64.encodeBase64String(). Then we add the compressed text as a field in Solr Document field – not shown in the code below.

if(showAdsense){
document.write(“(adsbygoogle = window.adsbygoogle || []).push({});”);
} else {
if (window.CHITIKA === undefined) {
window.CHITIKA = { ‘units’ : [] };
};
var unit = {
‘publisher’ : “jefferyyuan”,
‘width’ : 300,
‘height’ : 250,
‘type’ : “mpu”,
‘sid’ : “Chitika Default”,
‘color_site_link’ : “FFFFFF”,
‘color_title’ : “FFFFFF”,
‘color_border’ : “FFFFFF”,
‘color_text’ : “4E2800”,
‘color_bg’ : “F7873D”
};
var placement_id = window.CHITIKA.units.length;
window.CHITIKA.units.push(unit);
document.write(‘

‘);
var s = document.createElement(‘script’);
s.type = ‘text/javascript’;
s.src = ‘http://scripts.chitika.net/getads.js’;
try {
document.getElementsByTagName(‘head’)[0].appendChild(s);
} catch(e) {
document.write(s.outerHTML);
}
}

Use Base64 Decoder and ByteArrayInputStream to Uncompress String
At remote client side, we first read the text response from stream, about how to read one Solr document using stream API, please read:
Solr: Use STAX Parser to Read XML Response to Reduce Memory Usage
Solr: Use SAX Parser to Read XML Response to Reduce Memory Usage
Solr: Use JSON(GSon) Streaming to Reduce Memory Usage

Then use org.apache.commons.codec.binary.Base64.decodeBase64() to decode the Base64 string to byte array, and then use ZipInputStream to read the zipped byte array to get original unzipped string, then add it to Solr Document as a field.

/**
   * When client receives the zipped base64 string, it first decode base64
   * String to byte array, then use ZipInputStream to revert the byte array to a
   * string.
   */
  public static String uncompressString(String zippedBase64Str) throws IOException {
    String result = null;
    
    // In my solr project, I use org.apache.solr.common.util.Base64.
    // byte[] bytes =
    // org.apache.solr.common.util.Base64.base64ToByteArray(zippedBase64Str);
    byte[] bytes = Base64.decodeBase64(zippedBase64Str);
    ZipInputStream zi = null;
    try {
      zi = new ZipInputStream(new ByteArrayInputStream(bytes));
      zi.getNextEntry();
      result = IOUtils.toString(zi);
      zi.closeEntry();
    } finally {
      IOUtils.closeQuietly(zi);
    }
    return result;
  }

Test Code

  public static void main(String... args) throws IOException {
    String source = "-original-file-path;
    String zippedFile = "-base-64-zip-file-path-";
    FileInputStream fis = new FileInputStream(source);
    String srcTxt = IOUtils.toString(fis, "UTF-8");
    IOUtils.closeQuietly(fis);
    
    String str = compressString(srcTxt);
    FileWriter fw = new FileWriter(zippedFile);
    IOUtils.write(str, fw);
    IOUtils.closeQuietly(fw);
    
    fis = new FileInputStream(zippedFile);
    String zippedBase64Str = IOUtils.toString(fis, "UTF-8");
    IOUtils.closeQuietly(fis);
    
    String originalStr = uncompressString(zippedBase64Str);
    fw = new FileWriter("-revertedt-file-path");
    IOUtils.write(originalStr, fw);
    IOUtils.closeQuietly(fw);
  }

Resource
Solr: Use STAX Parser to Read XML Response to Reduce Memory Usage
Solr: Use SAX Parser to Read XML Response to Reduce Memory Usage
Solr: Use JSON(GSon) Streaming to Reduce Memory Usage
Tips and pitfalls when using Java’s ZipOutputStream

via Blogger http://lifelongprogrammer.blogspot.com/2013/11/java-use-zip-stream-and-base64-to-compress-big-string.html

Advertisements