Thursday, November 30, 2017

How to debug the unicode error in Py4J

Many Java-based big data systems such as SystemML, Spark's MLLib, CaffeOnSpark, etc that use the Py4J bridge in their Python APIs often throw a cryptic error instead of detailed Java stacktrace:
py4j.protocol.Py4JJavaError: < exception str() failed >
For more details on this issue, please refer to Python's unicode documentation.

Luckily, there is a very simple hack to extract the stack trace in such a situation. Let's assume that SystemML's MLContext is throwing the above Py4J error: ml.execute(script). In this case, simply wrap that code in the following try/except block: