Troubleshooting Java¶
PyTerrier integrates with several Java-backed engines, including Terrier. It uses PyJnius as a “glue” layer in order to call these Java components. PyTerrier manages this automatically for you (including downloading required JAR files, running the required Java “initialization” step, etc.), and uses reasonable defaults for the way it interacts with Java.
Typically you only need Java installed, but in some cases you may also need to adjust the Java configuration.
Installing Java¶
On MacOS with Homebrew, you can run brew install openjdk
On Ubuntu/Debian Linux, you can run sudo apt install openjdk-11-jdk-headless
For other systems, or of the above commands do not work, we recommend the Java Developer Resources Guide.
Java Configuration¶
This section describes how to manage Java’s configuration in PyTerrier.
Note
Because these options affect the the JVM’s settings, they need to be set before Java starts—for instance, at the top of a script/notebook before any Java components are loaded.
Starting Java. PyTerrier will start java when you use a component that requires it, such as pt.terrier.Retriever. However, sometimes
you might want to start it early:
You can also check if Java has been started (either automatically or by pt.java.init()):
- pyterrier.java.started()[source]¶
Returns True if pt.java.init() has been called. Otherwise False.
- Return type:
bool
Java Home Path. PyJnius will search the usual places on your machine for a Java installation. If you have problems, you can overrirde the java home path:
- pyterrier.java.set_java_home(java_home)[source]¶
Sets the directory to search when loading Java.
Note that you can achieve the same outcome by setting the JAVA_HOME environment variable.
- Parameters:
java_home (str)
Other General Options. The following are other options for configuring Java:
- pyterrier.java.add_package(org_name, package_name, version=None, file_type='jar')[source]¶
- Parameters:
org_name (str)
package_name (str)
version (str | None)
file_type (str)
- pyterrier.java.set_log_level(level)[source]¶
Set the logging level. The following string values are allowed, corresponding to Java logging levels:
‘ERROR’: only show error messages
‘WARN’: only show warnings and error messages (default)
‘INFO’: show information, warnings and error messages
‘DEBUG’: show debugging, information, warnings and error messages
Unlike other java settings, this can be changed either before or after init() has been called.
Terrier Configuration¶
These options adjust how the Terrier engine is loaded.
- pyterrier.terrier.set_property(k, v)[source]¶
Allows to set a property in Terrier’s global properties configuration. Example:
pt.set_property("termpipelines", "")
While Terrier has a variety of properties – as discussed in its indexing and retrieval configuration guides – in PyTerrier, we aim to expose Terrier configuration through appropriate methods or arguments. So this method should be seen as a safety-valve - a way to override the Terrier configuration not explicitly supported by PyTerrier.
Note on Deprecated Java Configuration¶
Previous versions of PyTerrier required you to run pt.init() (often in a if pt.started() block)
to configure Java and start using PyTerrier. This function still exists (it calls pt.java.init()
and associated other configuration methods), but is no longer needed and deprecated.
Instead, PyTerrier now automatically loads Java when a function is called that needs it.