I am making a Java jar file call from Python.
def extract_words(file_path):
"""
Extract words and bounding boxes
Arguments:
file_path {[str]} -- [Input file path]
Returns:
[Document]
"""
extractor = PDFBoxExtractor(file_path=file_path,jar_path="external/pdfbox-app-2.0.15.jar",class_path="external")
document = extractor.run()
return document
And somewhere:
pipe = subprocess.Popen(['java',
'-cp',
'.:%s:%s' %
(self._jar_path,
self._class_path) ,
'PrintTextLocations',
self._file_path],
stdout=subprocess.PIPE)
output = pipe.communicate()[0].decode()
This is working fine. But the problem is the jar is heavy and when I have to call this multiple times in a loop, it takes 3-4 seconds to load the jar file each time. If I run this in a loop for 100 iterations, it adds 300-400 seconds to the process.
Is there any way to keep the classpath alive for java and not load jar file every time? Whats the best way to do it in time optimised manner?