2

I am making a Java jar file call from Python.

def extract_words(file_path):
    """
    Extract words and bounding boxes

    Arguments:
        file_path {[str]} -- [Input file path]

    Returns:
        [Document]
    """

    extractor = PDFBoxExtractor(file_path=file_path,jar_path="external/pdfbox-app-2.0.15.jar",class_path="external")

    document = extractor.run()
    return document

And somewhere:

pipe = subprocess.Popen(['java',
                             '-cp',
                             '.:%s:%s' %
                             (self._jar_path,
                             self._class_path) ,
                             'PrintTextLocations',
                             self._file_path],
                            stdout=subprocess.PIPE)
    output = pipe.communicate()[0].decode()

This is working fine. But the problem is the jar is heavy and when I have to call this multiple times in a loop, it takes 3-4 seconds to load the jar file each time. If I run this in a loop for 100 iterations, it adds 300-400 seconds to the process.

Is there any way to keep the classpath alive for java and not load jar file every time? Whats the best way to do it in time optimised manner?

1 Answer 1

2

You can encapsulate your PDFBoxExtractor in a class my making it a class member. Initialize the PDFBoxExtractor in the constructor of the class. Like below:

class WordExtractor:

    def __init__(self):
        self.extractor = PDFBoxExtractor(file_path=file_path,jar_path="external/pdfbox-app-2.0.15.jar",class_path="external")

    def extract_words(self,file_path):
        """
        Extract words and bounding boxes

        Arguments:
            file_path {[str]} -- [Input file path]

        Returns:
            [Document]
        """

        document = self.extractor.run()
        return document

Next step would be to create instance of WordExtractor class outside the loop.

word_extractor = WordExtractor()

#your loop would go here
while True:
    document = word_extractor.extract_words(file_path);

This is just example code to explain the concept. You may tweak it the way you want as per your requirement.

Hope this helps !

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.