Maven: Invalid maximum heap size- From http://ift.tt/1ajReyV

Tags

,

The Problem
I follow the stanbol tutorial to build stanbol. 
export MAVEN_OPTS=”-Xmx1024M -XX:MaxPermSize=256M”

As I am building stanbol in Windows, so I changed it to:
set MAVEN_OPTS=”-Xmx1024M -XX:MaxPermSize=256M”

Then whenI run “mvn clean install -Dmaven.test.skip=true“, it failed with error:
Invalid maximum heap size: -Xmx1024M -XX:MaxPermSize=256M
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

//<![CDATA[
if(showAdsense){
document.write("(adsbygoogle = window.adsbygoogle || []).push({});”)
}
//]]>

The Solution
Check mvn.bat: the MAVEN_OPTS is passed as JVM parameter directly:
SET MAVEN_JAVA_EXE=”%JAVA_HOME%\bin\java.exe”
%MAVEN_JAVA_EXE% %MAVEN_OPTS% -classpath %CLASSWORLDS_JAR% “-Dclassworlds.conf=%M2_HOME%\bin\m2.conf” “-Dmaven.home=%M2_HOME%” %CLASSWORLDS_LAUNCHER% %MAVEN_CMD_LINE_ARGS%

If we run set MAVEN_OPTS=”-Xmx1024M -XX:MaxPermSize=256M”,the previous command would be replaced with %MAVEN_JAVA_EXE% “-Xmx1024M -XX:MaxPermSize=256M” …

If we run java “-Xmx1024M -XX:MaxermSize=256M” -version: we will see same error message.

The solution is simple: 
We can run:
set “MAVEN_OPTS=-Xmx1024M -XX:MaxPermSize=256M”
 or remove the double quotes completely:
set MAVEN_OPTS=-Xmx1024M -XX:MaxPermSize=256M

The value of MAVEN_OPTS would be with no surrounding double quotes.

Now, we know why windows version batch file (mvn.bat) failed. But why the Linux version succeed? Check Linux shell file: mvn: it doesn’t call java command directly, but call exec command.
exec “$JAVACMD” \
  $MAVEN_OPTS \
  -classpath “${M2_HOME}”/boot/plexus-classworlds-*.jar \
  “-Dclassworlds.conf=${M2_HOME}/bin/m2.conf” \
  “-Dmaven.home=${M2_HOME}”  \
  ${CLASSWORLDS_LAUNCHER} “$@”

via Blogger http://ift.tt/OpogKr

Customizing Eclipse Project Names With maven-eclipse-plugin- From http://ift.tt/1ajReyV

Tags

,

The Problem
When learning Apache UIMA, I need to import it into eclipse. Also I need swtich between different UIMA version 2.4.2 and the latest(trunk) version frequently.

I can use maven-eclipse-plugin to import uima projects to eclipse. But I have to add a prefix to these uima projects.

The Solution
Check maven-eclipse-pluginit provides several ways to change eclipse project name.
addGroupIdToProjectName
If set to true, the groupId of the artifact is appended to the name of the generated Eclipse project.
Expression: ${eclipse.addGroupIdToProjectName}

addVersionToProjectName
If set to true, the version number of the artifact is appended to the name of the generated Eclipse project. See projectNameTemplate for other options.
Expression: ${eclipse.addVersionToProjectName}

//<![CDATA[
if(showAdsense){
document.write("(adsbygoogle = window.adsbygoogle || []).push({});”)
}
//]]>

projectNameTemplate
Allows configuring the name of the eclipse projects. This property ifn set wins over addVersionToProjectName and addGroupIdToProjectName You can use [groupId], [artifactId] and [version] variables. eg. [groupId].[artifactId]-[version]
Expression: ${eclipse.projectNameTemplate}

So now the solution is obvious, we can add -Declipse.addVersionToProjectName=true
mvn -Declipse.addVersionToProjectName=true eclipse:eclipse
The project name would be updated as uimaj-core-2.4.2 or uimaj-core-2.5.1.


Or we can add -Declipse.projectNameTemplate=Prefix-[artifactId]-[version] 

Miscs
Use -DdownloadSources=true to tell maven to download source code.

Resource
Maven Eclipse Plugin
Get source jar files attached to Eclipse for Maven-managed dependencies

via Blogger http://ift.tt/1goZbtm

Windows C++: Handling unicode file names- From http://ift.tt/1ajReyV

Tags

,

The Problem
In C++, I used to use ifstream and ofstream to read and write file, to determine file existence.

But in recent project, as the file name may be unicode, we need use wide string, wchar_t.  Just find out that ifstream and ofstream doesn’t work with unicode file name.

The program is simple: it checks whether one file exists, if not, write some text into it, if exists, do nothing.
The code that doesn’t work

//<![CDATA[
if(showAdsense){
document.write("(adsbygoogle = window.adsbygoogle || []).push({});”)
}
//]]>

void codeDoesnotwork()
{
 wchar_t* filePath =L"E:\\tmp\\test.txt";
 ifstream infile(filePath);
 if (!infile.is_open() || 
  !infile.good())
 {
  cout << "file doesn't exist" << endl;
 }
 else
 {
  // it alwalys goes here, even the file doesn't exist.
  cout << "file exists" << endl;
 }

 // ofstream << or write doens't work.
 ofstream ofs(filePath,ios::out | ios::trunc);
 ofs<< "hello world!" << endl;
 ofs.write("1", 1);
 ofs.close();
}

This is because ifstream and ofstream doesn’t work with wide string.
The code that works

using namespace std;
typedef unsigned int uint;
void codeThatWorks()
{
 wchar_t *installFolder=L"E:\\tmp";
 wchar_t *fileName = L"test.txt";
 uint fileNameBufferSize = wcslen(installFolder) + wcslen(fileName) + 2;

 wchar_t *fileNameBuffer = new wchar_t[fileNameBufferSize];
 assert(fileNameBuffer);
 memset(fileNameBuffer, 0, fileNameBufferSize * sizeof(wchar_t)); 
 swprintf_s(fileNameBuffer, fileNameBufferSize, L"%s/%s", installFolder, fileName);

 FILE* oFile;
 oFile = _wfopen(fileNameBuffer,L"r");
 if(oFile==NULL)
 {
  cout << "file doesn't exist" << endl;
  //FILE* oFile;
  oFile = _wfopen(fileNameBuffer,L"w+");
  fprintf(oFile,"%s", "hello world");  
 }
 else
 {
  cout << "file exists" << endl;
 }
 fclose(oFile);
 delete[] fileNameBuffer;
}

Write Unicode Content to File
If we need write unicode content to file, we should use  _wfopen or wofstream to write wide string to file.
f = _wfopen(COMMON_FILE_PATH,L”w, ccs=UTF-16LE”);

Use Shlwapi.lib PathFileExists
We can use Shlwapi.lib’s PathFileExists to check file existence:
BOOL exist = PathFileExists(fileNameBuffer);
Also in order to use Shlwapi.h, we need add Shlwapi.lib library by right clicking the project -> “Configuration Properties” -> “Linker” -> “Input” -> “Additional Dependencies” -> “Edit” -> type “Shlwapi.lib” in the text box.

Resource
MSDN fopen, _wfopen
Wrote to a file using std::wofstream

via Blogger http://ift.tt/1fCdGu6

Using Fiddler to Capture Http Requests of a Java Application- From http://ift.tt/1ajReyV

Tags

,

Task1: Using Fiddler as a Proxy to Monitor Request from a Java Application
To configure a Java application to send web traffic to Fiddler, add the following parameters to JVM:
-Dhttp.proxyHost=proxyHost -Dhttp.proxyPort=8888

The proxy – here is the Fiddler and the java application can be run in different machines. In this case, ensure allow remote clients to connect is checked by clicking “Tools” -> “Fiddler Options” -> “Connections” -> “Allow remote computers to connect”. Then restart Fiddler.

Use Apache Http Client SystemDefaultHttpClient
If you are using Apache Http Client 4.2 or newer, we can use SystemDefaultHttpClient which honor standard system properties.

In 4.3 or newer, we can use HttpClientBuilder, which also honors these system properties.
HttpClient client = HttpClientBuilder.create().build();

Task2: Monitor Requests in Web Server
We want to monitor request and response in Java web server, for example: the web server is running at server1:8080.

//<![CDATA[
if(showAdsense){
document.write("(adsbygoogle = window.adsbygoogle || []).push({});”)
}
//]]>

Solution: Using Fiddler as a Reverse Proxy
We can use fiddler as a reverse proxy, first we change Fidder to listen on Port 8080 by right clicking “Tools” -> “Fiddler Options” -> “Connections”: change port number and allow remote computers to connect, then restart Fiddler if prompted.

Then change web server to run at another port, for example: 9090.

Create a FiddlerScript Rule
Then write a custom rule to tell Fidder to forward requests to server1:8080(the port Fiddler is listening) to server1:9090 where the real web server is listening.

Click Rules > Customize Rules.
Inside the OnBeforeRequest handler*, add a new line of code:
if (oSession.host.toLowerCase() == “server1:8080”) oSession.host = “server1:9090”;

Troubleshooting: Enable “Help” -> “Troubleshooting Filters…” 
If for some reason, fiddler is not capturing the request, we can enable option: “Help” -> “Troubleshooting Filters…”. This will show all traffic but strike out the requests that would be excluded by the filter.  It also provides a comment about why the request would be hidden. 

Resources
The Fiddler Proxy
Using Fiddler as a Reverse Proxy

via Blogger http://ift.tt/1oJLKoB

Regular Expression in Action: Remove or Merge Empty Lines- From http://ift.tt/1ajReyV

Tags

,

The Problem
I like to copy the code to Eclipse or NetBeans to read it and run it when read some interesting code in internet. The code in the web is usually not well formatted: many empty lines. This makes it harder to read the code.
So I like to remove empty lines  to make the code looks smaller and concise.

Here regex comes into play.

Remove Empty Lines

//<![CDATA[
if(showAdsense){
document.write("(adsbygoogle = window.adsbygoogle || []).push({});”)
}
//]]>

Find: ^\s*\n
Replace: (empty)

Merging Empty Lines
Find: ^\s*\n
Replace: \r\n or \n in windows 

Next, use Ctrl+Shif+F in Eclipse or Alt+Shif+F in Netbeans to fromat the code.

Online Regex Tools
http://regex101.com/
Here it will explain the meaning of the regular expression, we can test it.

  • /^\s*\n/
    • ^ assert position at start of the string
    • \s* match any white space character [\r\n\t\f ]
      • Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
    • \n matches a fine-feed (newline) character (ASCII 10)

http://ift.tt/15sGP4l
Here we can test our regualr expression in almost all languages, it can also give its java or .Net String, so we don’t have to manually escape special chracters such as \ or ” in the regex string.

via Blogger http://ift.tt/1fe0sPc

Understanding Jetty Handlers and Why Empty Request Log- From http://ift.tt/1ajReyV

Tags

,

The Problem
I am trying to add RequestLogHandler to our jetty server. According to Jetty/Tutorial/RequestLog, I add RequestLogHandler in jetty.xml then restart Jetty server.
The log file is created, but after accessing some web pages, it is still empty: nothing appears in the log.
In a Hurry?
Basically, the problem is caused by the misconfiguration of the handler chain and the misunderstanding of the difference of HandlerCollection and HandlerList.

Root Cause Analysis
The code is our loyal friend. So I attach remote debug, and try to understand jetty’s code.
Basically, when a http request comes, jetty will use its QueuedThreadPool to execute it, and each request will go through a handler chain until it is handled. 
The Original Code that Doesn’t Work
Because the application has some special logic when deploy web application: our customized solr war may be called app1-solr.war or app2-solr.war. But we want the web application context root is /solr.

So we deploy wars in webapps folder manually.

//<![CDATA[
if(showAdsense){
document.write("(adsbygoogle = window.adsbygoogle || []).push({});”)
}
//]]>

for (File war : wars) {
  String context = war.getName();
  if (context.endsWith("solr.war")) {
    context = "solr";
  }
}
handlerList.addHandler(webappContext);

We also define some handlers in web.xml:

<Set name="handler">
  <New id="Handlers" class="org.eclipse.jetty.server.handler.HandlerCollection">
    <Set name="handlers">
      <Array type="org.eclipse.jetty.server.Handler">
        <Item>
          <New id="Contexts" class="org.eclipse.jetty.server.handler.ContextHandlerCollection"/>
        </Item>
        <Item>
          <New id="DefaultHandler" class="org.eclipse.jetty.server.handler.DefaultHandler"/>
        </Item>
        <Item>
          <New id="RequestLog" class="org.eclipse.jetty.server.handler.RequestLogHandler"/>
        </Item>
      </Array>
    </Set>
  </New>
</Set>

The original code that starts embedded jetty looks like below:

private Server doStartEmbeddedJetty()
{
  HandlerList handlers = new HandlerList();
  for (File war : wars) {
    WebAppContext webappContext = new WebAppContext();
    webappContext.setContextPath("/" + context);
    webappContext.setWar(war.getAbsolutePath());
    handlers.addHandler(webappContext);
  }
  Handler oldHandler = server.getHandler();
  if (oldHandler != null) {
    handlers.addHandler(oldHandler);
  }
  server.setHandler(handlers);
}

Its handler chain would be like below: The whole chain is a HandlerList, which contains:
WebAppContext: solr.war
WebAppContext: app1.war

One HandlerCollection, it contains: ContextHandlerCollection, DefaultHandler, RequestLogHandler.


When a request comes, it will be served by one of the WebAppContext handlers, and will return, the RequestLogHandler doesn’t get chance to run at all.
Check the HandlerList implementation:

//This extension of HandlerCollection will call each contained handler in turn until either an exception is thrown
public void handle(String target, Request baseRequest, HttpServletRequest request, HttpServletResponse response) 
    throws IOException, ServletException
{
    Handler[] handlers = getHandlers();
    
    if (handlers!=null && isStarted())
    {
        for (int i=0;i<handlers.length;i++)
        {
            handlers[i].handle(target,baseRequest, request, response);
            if ( baseRequest.isHandled())
            // it will return here, RequestLogHandler doesn't get chance to run at all.
                return;
        }
    }
}

The Fix
Now we know the root cause of the problem, the fix would be easy. We will honour the handlers defined in jetty.xml, and add out customized WebAppContext into ContextHandlerCollection.

private Server doStartEmbeddedJetty()
{
  ContextHandlerCollection contextHandlers = null;
  Handler oldHandler = server.getHandler();
  if (oldHandler instanceof HandlerCollection) {
    Handler[] handlers = ((HandlerCollection) oldHandler).getHandlers();
    for (Handler handler : handlers) {
      if (handler instanceof ContextHandlerCollection) {
        contextHandlers = (ContextHandlerCollection) handler;
        break;
      }
    }
  }
  assert (contextHandlers != null);
  for (File war : wars) {
    WebAppContext webappContext = new WebAppContext();
    webappContext.setContextPath("/" + context);
    webappContext.setWar(war.getAbsolutePath());
    contextHandlers.addHandler(webappContext);
  }  
}

Now the handler chain would be like below: The whole chain is a HandlerCollection which contains the following parts:
One ContextHandlerCollection which contains multiple WebAppContext such as solr.war, app1.war.
One HandlerCollection which contains DefaultHandler, RequestLogHandler.



The HandlerCollection’s handle implementation looks like below:

//The default implementations calls all handlers in list order, regardless of the response status or exceptions. 
public void handle(String target, Request baseRequest, HttpServletRequest request, HttpServletResponse response) 
{
  for (int i=0;i<_handlers.length;i++)
  {
      _handlers[i].handle(target,baseRequest, request, response);

  }
}

So the ContextHandlerCollection will first handle it via one of its webappContext, then the remaining HandlerCollection will handle it:  DefaultHandler checks whether this handle is alredy handled, if yes, does nothing, otherwise throws 404 not found error, the RequestLogHandler now gets the chance to run, and would write the request info into log.


Or we may change HandlerList into HandlerCollection, but we want to honour the handler order in jetty.xml. If in the future we want to add a hanlder before ContextHandlerCollection, we can just add it in jetty.xml, no need to change the code.
Resources
Jetty/Tutorial/RequestLog

via Blogger http://ift.tt/1lDWwjb

Part3: Integrate Http Form Post Authentication in Nutch2- From http://ift.tt/1ajReyV

Tags

,

The Problem
Nutch and Nutch2 supports NTLM, Basic or Digest authentication to authenticate itself to websites. It doesn’t support Http Post Form Authentication.

Main Steps
Use Apache Http Client to do http post form authentication.
Make http post form authentication work.
Integrate http form authentication in Nutch2.

After previous two steps, now we can integrate http form authentication in Nutch2.
Define Http Form Post Authentication Properties in httpclient-auth.xml
Nutch uses http.auth.file to locate the xml file that defines credentials info, default value is httpclient-auth.xml. We extend httpclient-auth.xml to include information about http form authentication properties. The httpclient-auth.xml for the asp.net web application in last post is like below:

//<![CDATA[
if(showAdsense){
document.write("(adsbygoogle = window.adsbygoogle || []).push({});”)
}
//]]>

<?xml version="1.0"?>
<auth-configuration>
  <credentials authMethod="formAuth" loginUrl="http://localhost:44444/Account/Login.aspx" loginFormId="ctl01" loginRedirect="true">
    <loginPostData>
      <field name="ctl00$MainContent$LoginUser$UserName" value="admin"/>
      <field name="ctl00$MainContent$LoginUser$Password" value="admin123"/>
    </loginPostData>
    <removedFormFields>
      <field name="ctl00$MainContent$LoginUser$RememberMe"/>
    </removedFormFields>
  </credentials>
</auth-configuration>
Read Http Form Post Authentication from Configuration XML File
In Nutch’s http-client plugin, change org.apache.nutch.protocol.httpclient.Http.setCredentials() method to read authentication info into variable formConfigurer from configuration file.
Then change Http.resolveCredentials() method: if formConfigurer is not null, use HttpFormAuthentication to do form post login.
package org.apache.nutch.protocol.httpclient;
public class Http extends HttpBase {
 private void resolveCredentials(URL url) {
  if (formConfigurer != null) {
   HttpFormAuthentication formAuther = new HttpFormAuthentication(
     formConfigurer, client, this);
   try {
    formAuther.login();
   } catch (Exception e) {
    throw new RuntimeException(e);
   }
   return;
  }
  }
 private static synchronized void setCredentials()
   throws ParserConfigurationException, SAXException, IOException {

  if (authRulesRead)
   return;

  authRulesRead = true; // Avoid re-attempting to read
  InputStream is = conf.getConfResourceAsInputStream(authFile);
  if (is != null) {
   Document doc = DocumentBuilderFactory.newInstance()
     .newDocumentBuilder().parse(is);

   Element rootElement = doc.getDocumentElement();
   if (!"auth-configuration".equals(rootElement.getTagName())) {
    if (LOG.isWarnEnabled())
     LOG.warn("Bad auth conf file: root element <"
       + rootElement.getTagName() + "> found in "
       + authFile + " - must be <auth-configuration>");
   }

   // For each set of credentials
   NodeList credList = rootElement.getChildNodes();
   for (int i = 0; i < credList.getLength(); i++) {
    Node credNode = credList.item(i);
    if (!(credNode instanceof Element))
     continue;

    Element credElement = (Element) credNode;
    if (!"credentials".equals(credElement.getTagName())) {
     if (LOG.isWarnEnabled())
      LOG.warn("Bad auth conf file: Element <"
        + credElement.getTagName()
        + "> not recognized in " + authFile
        + " - expected <credentials>");
     continue;
    }
        // read http form post auth info
    String authMethod = credElement.getAttribute("authMethod");
    if (StringUtils.isNotBlank(authMethod)) {
     formConfigurer = readFormAuthConfigurer(credElement,
       authMethod);
     continue;
    }
      }
    }
  }
 private static HttpFormAuthConfigurer readFormAuthConfigurer(
   Element credElement, String authMethod) {
  if ("formAuth".equals(authMethod)) {
   HttpFormAuthConfigurer formConfigurer = new HttpFormAuthConfigurer();

   String str = credElement.getAttribute("loginUrl");
   if (StringUtils.isNotBlank(str)) {
    formConfigurer.setLoginUrl(str.trim());
   } else {
    throw new IllegalArgumentException("Must set loginUrl.");
   }
   str = credElement.getAttribute("loginFormId");
   if (StringUtils.isNotBlank(str)) {
    formConfigurer.setLoginFormId(str.trim());
   } else {
    throw new IllegalArgumentException("Must set loginFormId.");
   }
   str = credElement.getAttribute("loginRedirect");
   if (StringUtils.isNotBlank(str)) {
    formConfigurer.setLoginRedirect(Boolean.parseBoolean(str));
   }

   NodeList nodeList = credElement.getChildNodes();
   for (int j = 0; j < nodeList.getLength(); j++) {
    Node node = nodeList.item(j);
    if (!(node instanceof Element))
     continue;

    Element element = (Element) node;
    if ("loginPostData".equals(element.getTagName())) {
     Map<String, String> loginPostData = new HashMap<String, String>();
     NodeList childNodes = element.getChildNodes();
     for (int k = 0; k < childNodes.getLength(); k++) {
      Node fieldNode = childNodes.item(k);
      if (!(fieldNode instanceof Element))
       continue;

      Element fieldElement = (Element) fieldNode;
      String name = fieldElement.getAttribute("name");
      String value = fieldElement.getAttribute("value");
      loginPostData.put(name, value);
     }
     formConfigurer.setLoginPostData(loginPostData);
    } else if ("additionalPostHeaders".equals(element.getTagName())) {
     Map<String, String> additionalPostHeaders = new HashMap<String, String>();
     NodeList childNodes = element.getChildNodes();
     for (int k = 0; k < childNodes.getLength(); k++) {
      Node fieldNode = childNodes.item(k);
      if (!(fieldNode instanceof Element))
       continue;

      Element fieldElement = (Element) fieldNode;
      String name = fieldElement.getAttribute("name");
      String value = fieldElement.getAttribute("value");
      additionalPostHeaders.put(name, value);
     }
     formConfigurer
       .setAdditionalPostHeaders(additionalPostHeaders);
    } else if ("removedFormFields".equals(element.getTagName())) {
     Set<String> removedFormFields = new HashSet<String>();
     NodeList childNodes = element.getChildNodes();
     for (int k = 0; k < childNodes.getLength(); k++) {
      Node fieldNode = childNodes.item(k);
      if (!(fieldNode instanceof Element))
       continue;

      Element fieldElement = (Element) fieldNode;
      String name = fieldElement.getAttribute("name");
      removedFormFields.add(name);
     }
     formConfigurer.setRemovedFormFields(removedFormFields);
    }
   }
   return formConfigurer;
  } else {
   throw new IllegalArgumentException("Unsupported authMethod: "
     + authMethod);
  }
 }  
}  
Resources

via Blogger http://ift.tt/MvVs1d

Nutch2 Http Form Authentication-Part2: Make Http Post Form Authentication Work- From http://ift.tt/1ajReyV

Tags

,

The Problem
Nutch and Nutch2 supports NTLM, Basic or Digest authentication to authenticate itself to websites. It doesn’t support Http Post Form Authentication.

Main Steps
Use Apache Http Client to do http post form authentication.
Make http post form authentication work.
Integrate form authentication in Nutch2.

This article will focus on how to make http post form authentication work via a practical example.
Create and Run ASP.NET Web Application
In visual studio, create a ASP.NET (MVC2) web application, the default created web application supports form authentication. It’s good to test our http form login.

Write Test Code
To use HttpFormAuthentication to do http post form authentication, we have to figure out the loginFormId: this can be done by searching “<form” in page source. Also use Chrom Devtools’s “Inspect element” function, we can easily find out the name of username and password fields. Be sure to use name field, not id field of input element.

Now we can write test code:

//<![CDATA[
if(showAdsense){
document.write("(adsbygoogle = window.adsbygoogle || []).push({});”)
}
//]]>

 private static void authTestAspWebApp() throws Exception, IOException {
  HttpFormAuthConfigurer authConfigurer = new HttpFormAuthConfigurer();
  authConfigurer.setLoginUrl("http://localhost:44444/Account/Login.aspx")
    .setLoginFormId("ctl01").setLoginRedirect(true);
  Map<String, String> loginPostData = new HashMap<String, String>();
  loginPostData.put("ctl00$MainContent$LoginUser$UserName", "admin");
  loginPostData.put("ctl00$MainContent$LoginUser$Password", "admin123");
  authConfigurer.setLoginPostData(loginPostData);

  Set<String> removedFormFields = new HashSet<String>();
  removedFormFields.add("ctl00$MainContent$LoginUser$RememberMe");
  authConfigurer.setRemovedFormFields(removedFormFields);

  HttpFormAuthentication example = new HttpFormAuthentication(
    authConfigurer);

  // example.client.getHostConfiguration().setProxy("127.0.0.1", 8888);

  String proxyHost = System.getProperty("http.proxyHost");
  String proxyPort = System.getProperty("http.proxyPort");
  if (StringUtils.isNotBlank(proxyHost)
    && StringUtils.isNotBlank(proxyPort)) {
   example.client.getHostConfiguration().setProxy(proxyHost,
     Integer.parseInt(proxyPort));
  }

  example.login();
  String result = example
    .httpGetPageContent("http://localhost:44444/secret/needlogin.aspx");
  System.out.println(result);
 }

Run the previous test code, check Response Code, Response headers and response body. We can copy the whole response body to jsbin, there we can view the html much easily.


What to Do if it doesn’t Work?
But sometimes things are not that simple, the previous code may still not work: that user is not logined, and we can’t access protected resource.

When this happens, we need compare the request Apache http client sends with the request Chrome sends, including headers and request body. 

We can use Chrome DevTools to get request headers and post body, we can even copy the request as a cURL request and execute in command line.
We can start fiddler as a proxy, add example.client.getHostConfiguration().setProxy(“127.0.0.1”, 8888); in test code, then monitor request and response Apache http client sends and receives in fiddler.

Compare them and check whether some headers a missing, if so add them into additionalPostHeaders. Check whether we need remove some fields, if so add them into removedFormFields. Check whether we need add more fields, if so add them into loginPostData.

After all this, we should be able to make it work.
We can get request headers and post body via Chrome DevTools like below, we can even copy the request as a cURL request and execute in command line.

via Blogger http://ift.tt/1dBkvYa

Nutch2 Http Form Authentication-Part1: Using Apache Http Client to Do Http Post Form Authentication- From http://ift.tt/1ajReyV

Tags

,

The Problem
Nutch and Nutch2 supports NTLM, Basic or Digest authentication to authenticate itself to websites. It doesn’t support Http Post Form Authentication.

Main Steps
Use Apache Http Client to do http post form authentication.
Test http post form authentication.
Integrate with Nutch2.
Use Apache Http Client to Do Http Post Form Authentication
HttpFormAuthConfigurer
First let’s check the HttpFormAuthConfigurer class. No need to explain loginUrl and loginFormId. loginPostData stores the field name and value for login fields, such as username:user1, passowrd:password1. removedFormFields told us input field we want to remove, additionalPostHeaders is uesed when we have to add addtional header name and value when do post form login. if loginRedirect is true, and http post login returns redirect code: 301 or 302, Http Client will automatically follow the redirect.

package org.apache.nutch.protocol.httpclient;
public class HttpFormAuthConfigurer {
	private String loginUrl;
	private String loginFormId;
	private Map<String, String> loginPostData;
	private Set<String> removedFormFields;	
	private Map<String, String> additionalPostHeaders;
	private boolean loginRedirect;
}	

HttpFormAuthentication 
In login method, it first calls CookieHandler.setDefault(new CookieManager()); so if login succeeds, subsequent request would not require login again.

Then it sends a http get request to the loginUrl, uses Jsoup.parse(pageContent) to parse the response, iterates all input fields in the login form, adds all field names and values into List params, sets values for username and password fields which are stored in loginPostData, we may also have to remove some form fields(in removedFormFields). Then send a post request to the loginUrl with data: List params.

The following code uses Apache Http Client 3.x, as Nutch2 still uses the pretty old http client library.

package org.apache.nutch.protocol.httpclient;

public class HttpFormAuthentication {
	private static final Logger LOGGER = LoggerFactory
			.getLogger(HttpFormAuthentication.class);
	private static Map<String, String> defaultLoginHeaders = new HashMap<String, String>();
	static {
		defaultLoginHeaders.put("User-Agent", "Mozilla/5.0");
		defaultLoginHeaders
				.put("Accept",
						"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
		defaultLoginHeaders.put("Accept-Language", "en-US,en;q=0.5");
		defaultLoginHeaders.put("Connection", "keep-alive");
		defaultLoginHeaders.put("Content-Type",
				"application/x-www-form-urlencoded");
	}

	private HttpClient client;
	private HttpFormAuthConfigurer authConfigurer = new HttpFormAuthConfigurer();
	private String cookies;

	public HttpFormAuthentication(HttpFormAuthConfigurer authConfigurer) {
		this.authConfigurer = authConfigurer;
		this.client = new HttpClient();
	}
	public HttpFormAuthentication(HttpFormAuthConfigurer authConfigurer,
			HttpClient client, Http http) {
		this.authConfigurer = authConfigurer;
		this.client = client;
		defaultLoginHeaders.put("Accept", http.getAccept());
		defaultLoginHeaders.put("Accept-Language", http.getAcceptLanguage());
		defaultLoginHeaders.put("User-Agent", http.getUserAgent());
	}
	public void login() throws Exception {
		// make sure cookies is turn on
		CookieHandler.setDefault(new CookieManager());
		String pageContent = httpGetPageContent(authConfigurer.getLoginUrl());
		List<NameValuePair> params = getLoginFormParams(pageContent);
		sendPost(authConfigurer.getLoginUrl(), params);
	}

	private void sendPost(String url, List<NameValuePair> params)
			throws Exception {
		PostMethod post = null;
		try {
			if (authConfigurer.isLoginRedirect()) {
				post = new PostMethod(url) {
					@Override
					public boolean getFollowRedirects() {
						return true;
					}
				};
			} else {
				post = new PostMethod(url);
			}
			// we can't use post.setFollowRedirects(true) as it will throw
			// IllegalArgumentException:
			// Entity enclosing requests cannot be redirected without user
			// intervention
			setLoginHeader(post);
			post.addParameters(params.toArray(new NameValuePair[0]));
			// post.setEntity(new UrlEncodedFormEntity(postParams));

			int rspCode = client.executeMethod(post);
			if (LOGGER.isDebugEnabled()) {
				LOGGER.info("rspCode: " + rspCode);
				LOGGER.info("\nSending 'POST' request to URL : " + url);

				LOGGER.info("Post parameters : " + params);
				LOGGER.info("Response Code : " + rspCode);

				for (Header header : post.getRequestHeaders()) {
					LOGGER.info("Response headers : " + header);
				}
			}
			String rst = IOUtils.toString(post.getResponseBodyAsStream());
			LOGGER.debug("login post result: " + rst);
		} finally {
			if (post != null) {
				post.releaseConnection();
			}
		}
	}

	private void setLoginHeader(PostMethod post) {
		Map<String, String> headers = new HashMap<String, String>();
		headers.putAll(defaultLoginHeaders);
		// additionalPostHeaders can overwrite value in defaultLoginHeaders
		headers.putAll(authConfigurer.getAdditionalPostHeaders());
		for (Entry<String, String> entry : headers.entrySet()) {
			post.addRequestHeader(entry.getKey(), entry.getValue());
		}
		post.addRequestHeader("Cookie", getCookies());
	}

	private String httpGetPageContent(String url) throws IOException {

		GetMethod get = new GetMethod(url);
		try {
			for (Entry<String, String> entry : authConfigurer
					.getAdditionalPostHeaders().entrySet()) {
				get.addRequestHeader(entry.getKey(), entry.getValue());
			}
			client.executeMethod(get);
      
			Header cookieHeader = get.getResponseHeader("Set-Cookie");
			if (cookieHeader != null) {
				setCookies(cookieHeader.getValue());
			}
			return IOUtils.toString(get.getResponseBodyAsStream());
		} finally {
			get.releaseConnection();
		}
	}

	private List<NameValuePair> getLoginFormParams(String pageContent)
			throws UnsupportedEncodingException {
		List<NameValuePair> params = new ArrayList<NameValuePair>();
		Document doc = Jsoup.parse(pageContent);
		Element loginform = doc.getElementById(authConfigurer.getLoginFormId());
		if (loginform == null) {
			throw new IllegalArgumentException("No form exists: "
					+ authConfigurer.getLoginFormId());
		}
		Elements inputElements = loginform.getElementsByTag("input");

		// skip fields in removedFormFields or loginPostData
		for (Element inputElement : inputElements) {
			String key = inputElement.attr("name");
			String value = inputElement.attr("value");
			if (authConfigurer.getLoginPostData().containsKey(key)
					|| authConfigurer.getRemovedFormFields().contains(key)) {
				continue;
			}
			params.add(new NameValuePair(key, value));
		}
		// add key and value in loginPostData
		for (Entry<String, String> entry : authConfigurer.getLoginPostData()
				.entrySet()) {
			params.add(new NameValuePair(entry.getKey(), entry.getValue()));
		}
		return params;
	}
}

Http Form Authentication in Apache Http Client 4.x

public class HttpCilentFormLoginExample {
  private static final Logger LOGGER = LoggerFactory
      .getLogger(HttpCilentFormLoginExample.class);
  private DefaultHttpClient client = new DefaultHttpClient();
  private String loginUrl, loginForm;  
  private static Map<String,String> defaultLoginHeaders = new HashMap<String,String>();  
  static {
    defaultLoginHeaders.put("User-Agent", "Mozilla/5.0");
    defaultLoginHeaders.put("Accept",
        "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
    defaultLoginHeaders.put("Accept-Language", "en-US,en;q=0.5");
    defaultLoginHeaders.put("Connection", "keep-alive");
    // defaultLoginHeaders.put("Referer",
    // "http://ift.tt/18U6d7n");
    defaultLoginHeaders
        .put("Content-Type", "application/x-www-form-urlencoded");
  }
  private Map<String,String> loginPostData;
  private Map<String,String> additionalPostHeaders;
  private Set<String> removedFormFields;
  private String cookies;
  
  public HttpCilentFormLoginExample(String loginUrl, String loginForm,
      Map<String,String> loginPostData,
      Map<String,String> additionalPostHeaders, Set<String> removedFormFields) {
    this.loginUrl = loginUrl;
    this.loginForm = loginForm;
    this.loginPostData = loginPostData == null ? new HashMap<String,String>()
        : loginPostData;
    this.additionalPostHeaders = additionalPostHeaders == null ? new HashMap<String,String>()
        : additionalPostHeaders;
    this.removedFormFields = removedFormFields == null ? new HashSet<String>()
        : removedFormFields;
  }
    
  public void login() throws Exception, UnsupportedEncodingException {
    client.setRedirectStrategy(new LaxRedirectStrategy());
    // make sure cookies is turn on
    CookieHandler.setDefault(new CookieManager());
    String pageContent = httpGetPageContent(loginUrl);
    List<NameValuePair> postParams = getLoginFormParams(pageContent);
    sendPost(loginUrl, postParams);
  }
  
  private void sendPost(String url, List<NameValuePair> postParams)
      throws Exception {
    HttpPost post = new HttpPost(url);
    try {
      setLoginHeader(post);
      post.setEntity(new UrlEncodedFormEntity(postParams));      
      HttpResponse response = client.execute(post);      
      int responseCode = response.getStatusLine().getStatusCode();
      if (LOGGER.isDebugEnabled()) {
        LOGGER.info("rspCode: " + responseCode);
        LOGGER.info("\nSending 'POST' request to URL : " + url);
        LOGGER.info("Post parameters : " + postParams);
        for (Header header : response.getAllHeaders()) {
          LOGGER.info("Response headers : " + header);
        }
      }
      String rst = IOUtils.toString(response.getEntity().getContent());
      LOGGER.debug("login post result: " + rst);
    } finally {
      post.releaseConnection();
    }
  }
  
  private void setLoginHeader(HttpPost post) {
    Map<String,String> headers = new HashMap<String,String>();
    headers.putAll(defaultLoginHeaders);
    // additionalPostHeaders can overwrite value in defaultLoginHeaders
    headers.putAll(additionalPostHeaders);
    for (Entry<String,String> entry : headers.entrySet()) {
      post.setHeader(entry.getKey(), entry.getValue());
    }
    post.setHeader("Cookie", getCookies());
  }
  
  private String httpGetPageContent(String url) throws IOException {    
    HttpGet get = new HttpGet(url);
    try {
      for (Entry<String,String> entry : additionalPostHeaders.entrySet()) {
        get.setHeader(entry.getKey(), entry.getValue());
      }
      HttpResponse response = client.execute(get);
      setCookies(response.getFirstHeader("Set-Cookie") == null ? "" : response
          .getFirstHeader("Set-Cookie").toString());
      return IOUtils.toString(response.getEntity().getContent());
    } finally {
      get.releaseConnection();
    }    
  }
  
  private List<NameValuePair> getLoginFormParams(String pageContent)
      throws UnsupportedEncodingException {
    Document doc = Jsoup.parse(pageContent);
    List<NameValuePair> paramList = new ArrayList<NameValuePair>();
    Element loginform = doc.getElementById(loginForm);
    if (loginform == null) {
      throw new IllegalArgumentException("No form exists: " + loginForm);
    }
    Elements inputElements = loginform.getElementsByTag("input");
    // skip fields in removedFormFields or loginPostData
    for (Element inputElement : inputElements) {
      String key = inputElement.attr("name");
      String value = inputElement.attr("value");
      if (loginPostData.containsKey(key) || removedFormFields.contains(key)) {
        continue;
      }
      paramList.add(new BasicNameValuePair(key, value));
    }
    // add key and value in loginPostData
    for (Entry<String,String> entry : loginPostData.entrySet()) {
      paramList.add(new BasicNameValuePair(entry.getKey(), entry.getValue()));
    }
    return paramList;
  }
}
Resources
Cookie Handling in Java SE 6
Apache HttpClient – Automate login Google

via Blogger http://ift.tt/1jB4FBu

Run Commands Faster in PowerShell- From http://ift.tt/1ajReyV

Tags

,

In Linux, we can use ! to execute commands faster. such as !! or !-1 or up arrow to execute last command, use !prex to run last command that starts with a specific word.

We can do same thing in PowerShell.

Get-History: alias h
Invoke-History: alias r
Call r to execute last command.
Call r prefix to execute last command that starts with a specific word:
r ant               Run last ant command.
r “git push”        Run last git push command: notice if there is space in the prefix, we have to put them in double quotes.

Use get history to show id of commands, then run:
r id(for example r 3
The Invoke-History cmdlet accepts only a single ID, if we want to run multiple commands, run r 3; r 5
The Last Command:  $^


Run Multiple PowerShell in Tabs Mode
Use ConEmu to run multiple PowerShell in tabs mode
Another option is console2.

Resources
ConEmu – The Windows Terminal/Console/Prompt we’ve been waiting for?

via Blogger http://ift.tt/OAFe8Z