13. Troubleshooting

This section describes some useful tips for troubleshooting problems with an Inca deployment.

13.1. Log Files

The agent, depot and consumer logs are located in the $INCA_DIST/var directory. Reporter manager logs are located in each manager's install directory under the var directory (e.g. ~/incaReporterManager/var).

Logging is informational by default, but can be adjusted to be more verbose ('info' to 'debug') or less verbose ('info' to 'error') by editing the $INCA_DIST/etc/common/log4j.properties file and then restarting inca components. Note that passwords are logged when 'debug' logging is turned on. Logging for the inca components can be adjusted by editing lines 26 and 27 ("log4j.rootLogger=info, stdout" and "log4j.logger.edu.sdsc.inca=info"). To log the most verbose globus error messages change line 33 in log4j.properties from "log4j.logger.org.globus=error" to "log4j.logger.org.globus=debug".

If a reporter manager is not started on a resource after you have scheduled reporters to run there, it is likely the build on that resource failed. You can confirm by looking for "Unable to stage reporter manager to " in $INCA_DIST/var/agent.log. If found, look for errors on the resource in the 2 build log files from the reporter manager build attempt: ~/incaReporterManager/build.log and ~/incaReporterManager/Inca-ReporterManager-*/build.log. The most common build failure is a bad build of the dependency Net::SSLeay which is required for secure communication; the build for Net::SSLeay will fail if it is unable to find OpenSSL on the resource. The Perl SSL modules call the same functions as "openssl verify".

13.2. Authentication Problems

If you observe that a reporter manager on a resource is having trouble connecting to either the agent or depot, there could be a problem with either the installed SSL libraries or certificates. To test, use the pingClient command as in the example below. Replace $RM_INSTALL_DIR with the path to the directory where the reporter manager is installed. Supply the appropriate hostname and port for either the depot or agent. After pressing return, type the password for the certificates.

% sbin/inca pingClient \
  -c etc/rmcert.pem \
  -k etc/rmkey.pem -P true -t etc/trusted \
  -uri incas://<agent or depot hostname>:<port>
<your password>

If there are no problems contacting either the agent or depot, the exit code will be 0 and nothing will be printed to the screen. If there is an authentication problem, the exit code will be non-zero and a message such as the following will be printed to stderr:

ERROR - Unable to create Inca::IO socket: : IO::Socket::INET configuration failed
Error contacting Inca server 'incas://rocks-101:6325' at bin/inca-ping-client line 137, <STDIN> line 1.

One possible cause of connection problems may occur when the agent does a local lookup of 'localhost' for its hostname but Java doesn't find the fully qualified hostname. An example is when the logged agent hostname is something like 'agent-machine:6323' but should be 'agent-machine.sdsc.edu:6323'. You can override this by adding the fully qualified hostname to your $INCA_DIST/etc/common/inca.properties file on the agent (add 'inca.agent.hostname=agent-machine.sdsc.edu' to inca.properties in this case). Change other properties with 'localhost' values to your fully qualified hostname, e.g.:

details at http://consumer-machine.sdsc.edu:8080/inca/xslt.jsp?xsl=instance.xsl&instanceID=@instanceId@&configID=@configId@&resourceName=@resource@\n\n

After you do this, you may need to remove the reporter manager directories on your Grid machines and re-initialize the configuration with the "bin/inca default" command so that new suite names are created with the correct hostname. If you've already invested a lot in your configuration and don't want to re-initialize, email inca@sdsc.edu for help.

Please email inca@sdsc.edu if you are unable to determine the cause of the authentication problem.