Tags

,

My Solr application runs at user’s laptop, the max memory is set to 512mb. It pull JSON data from remote proxy which talks with remote Solr Server: 100m hundred at a time, commit after 20 times.

Our code gets the whole json and put it into memory, and use UpdateRequestProcessor.processAdd(AddUpdateCommand) to add it into local solr.

Recently it throws OutOfMemoryError, after use Eclipse Memory Analyzer (MAT) to analyze the heapdump file.

I found out that it is because the data returned from remote proxy is too large:, one data is 50-60kb on average. But some data is huge, 100 data would be 60 mb: this is rare case, but when this happens it will cause the application throws OutOfMemoryError and stops to work.

To fix this and reduce memory usage at client side, I take several measures:
if(document.location.pathname.match(new RegExp(“/[0-9]{4}/[0-9]{2}/.*”))!=null){
document.write(“(adsbygoogle = window.adsbygoogle || []).push({});”)
}
1. Reboot the application when OutOfMemoryError happens.
2. Run a thread to monitor free memory, at a certain threshold(40%), run gc. If less than 30%, decrease fetch size(100 to 50, to 25) and decrease commit interval( 20 times, 10 times). If less than 50 mb memory, restart the application.
3. Enable Auto SoftCommit and AutoCommit, reduce Solr cache size.
3. Use Streaming JSON. – This is the topic of this article.
Read document one by one from http input stream, put it to queue, instead read the whole big document in to memory. Another thread is responsible to write the document to local solr.

Same approach apples if we use XML: we can use StAX or SAX to read document one by one.

I use GSON, about how to use Gson Streaming to read and write JSON, please read Gson Streaming 

The code to read document one by one from http stream:

private static ImportedResult handleResponse(SolrQueryRequest request,
      InputStream in) throws UnsupportedEncodingException, IOException {
    ImportedResult importedResult = new ImportedResult();
    JsonReader reader = null;
    try {
      reader = new JsonReader(new InputStreamReader(in, "UTF-8"));
      reader.beginObject();
      String str = reader.nextName();
      reader.beginObject();
      int fetchedSize = 0;
      int numFound = -1, start = -1;
      while (reader.hasNext()) {
        str = reader.nextName();
        System.out.println(str);
        
        if ("numFound".equals(str)) {
          numFound = Integer.valueOf(reader.nextString());
        } else if ("start".equals(str)) {
          start = Integer.valueOf(reader.nextString());
        } else if ("docs".equals(str)) {
          reader.beginArray();
          // read documents
          while (reader.hasNext()) {
            fetchedSize++;
            readOneDoc(request, reader);
          }
          reader.endArray();
        }
      }
      
      reader.endObject();
      
      importedResult.setFetched(fetchedSize);
      importedResult.setHasMore((fetchedSize + start) < numFound);
      importedResult.setImportedData((fetchedSize != 0));
      return importedResult;
    } finally {
      if (reader != null) {
        reader.close();
      }
    }
}
  
private static void readOneDoc(SolrQueryRequest request, JsonReader reader)
      throws IOException {
    String str;
    reader.beginObject();
    String id = null, binaryDoc = null;
    while (reader.hasNext()) {
      str = reader.nextName();
      
      if ("id".equals(str)) {
        id = reader.nextString();
      } else if ("binaryDoc".equals(str)) {
        binaryDoc = reader.nextString();
      }
    }
    DataImporter.getInstance().importData(request, id, binaryDoc);
    reader.endObject();
}

The code to write document to local solr:

public class DataImporter {
  private ThreadPoolExecutor executor;
  private DataImporter() {
    executor = new ThreadPoolExecutor(2, 5, 500, TimeUnit.SECONDS,
        new ArrayBlockingQueue<Runnable>(queueSize), new CVThreadFactory(
            "CVImporterThread"), new ThreadPoolExecutor.CallerRunsPolicy());
    executor.allowCoreThreadTimeOut(true);
  }  
  public void importData(SolrQueryRequest request, String id,
      String bindoc) {
    SolrDataImporter task = new SolrDataImporter(request, id, bindoc);
    executor.submit(task);
  }
  private static SolrInputDocument convertToSolrDoc(String id,
      String bindoc) throws IOException {
    byte[] bindata = Base64.base64ToByteArray(bindoc);
 // here read the SolrInputDocument from bindoc
    SolrInputDocument resultDoc = (SolrInputDocument) readZippedFile(bindata);
    resultDoc.setField("id", id);
    return resultDoc;
  }  
  private class SolrDataImporter implements Runnable {
    private SolrQueryRequest request;
    private String id, bindoc;
    public void run() {
      try {
        UpdateRequestProcessorChain updateChian = request.getCore()
            .getUpdateProcessingChain("mychain");
        
        SolrInputDocument toSolrServerSolrDoc = convertToSolrDoc(id,
            bindoc);
        bindoc = null;
        id = null;
        AddUpdateCommand command = new AddUpdateCommand(request);
        command.solrDoc = toSolrServerSolrDoc;
        SolrQueryResponse response = new SolrQueryResponse();
        UpdateRequestProcessor processor = updateChian.createProcessor(request,
            response);
        processor.processAdd(command);
      } catch (Exception e) {
        logger.error("Exception happened when importdata, id: "
            + id, e);
      }
    }
  }
}



via Blogger http://lifelongprogrammer.blogspot.com/2013/10/use-json-streaming-to-reduce-memory.html

Advertisements