I am using the library Apache-POI for my app. Specifically, POIshadow-all (ver. 3.17) for reading a Word document. I am successfully extracting every paragraph as follows:
what I actually need is extract every line, as follows:
The code to extract every paragraph is this:
try {
val fis = FileInputStream(path.path + "/" + document)
val xdoc = XWPFDocument(OPCPackage.open(fis))
val paragraphList: MutableList<XWPFParagraph> = xdoc.paragraphs
private val newParagraph = paragraph.createRun()
...
for (par in paragraphList) {
var currentParagraph = par.text
Log.i("TAG","current: $currentParagraph")
...
The variable currentParagraph returns a whole paragraph, as expected. However, I would need a variable named currentLine which returns a single line.
I've research about this issue in stackoverflow and other sites. I've found some proposals but none of them works for me. I also tried get dates by ctr and using XWPFRun, without any success.
I would be grateful for any recommendation on how to proceed.
Thanks in advance for your help.
The metadata of a document does not store how many lines are there in a given paragraph because it depends on how you render or view it. Think of a word document, if you have a larger font-size, you will have more lines in a given paragraph, alternatively, if you have a smaller font-size, you would have fewer lines in a paragraph. Therefore, the number of lines in each paragraph is inconsistent i.e. a variable.
However, if there’s a hard and fast requirement within your application to have an estimate, you can program some logic like “start a new line after X (a constant) number of characters (round off to the end of the word)”. This again could change depending on the screen size, font-size, zoom-level etc. so my suggestion would be to work out a scenario in your application where you do not explicitly measure the number of lines in a given paragraph, rather the number of words or characters and use that as a yardstick measure to insert a line-break if absolutely necessary.
Another potential approach you could use would be to separate sentences using escape characters e.g. “Start a new sentence after each ‘?’, ‘!’ or ‘.’ character within a paragraph.” This too can get rather tricky, depending on the structure of certain sentences.
Therefore, the answer to your question is that there is no “out of the box” way to detect the number of lines in a given paragraph using Apache POI, you would have to program your own logic there (perhaps using an approach outlined above), if absolutely necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With