Reading Microsoft Word Document in JAVA
When it comes to reading Microsoft Office Word document Java does not have any in build classes to handle this but Apache POI Package developed by Apache Foundation gives you the power of reading Microsoft Word document in Java. More information on the Apache POI package can be found at Apache POI
import org.apache.poi.poifs.filesystem.*; import org.apache.poi.hwpf.*; import org.apache.poi.hwpf.extractor.*; import java.io.*; public class readDoc { public static void main( String[] args ) { String filesname = "Hello.doc"; POIFSFileSystem fs = null; try { fs = new POIFSFileSystem(new FileInputStream(filesname; //Couldn't close the braces at the end as my site did not allow it to close HWPFDocument doc = new HWPFDocument(fs); WordExtractor we = new WordExtractor(doc); String[] paragraphs = we.getParagraphText(); System.out.println( "Word Document has " + paragraphs.length + " paragraphs" ); for( int i=0; i<paragraphs .length; i++ ) { paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n",""); System.out.println( "Length:"+paragraphs[ i ].length()); } } catch(Exception e) { e.printStackTrace(); } } }
Code Explanation:
- Creating new POIFSFileSystem Object and passing the Microsoft Word document to it
- Creating new object of HWPFDocument class, this class is specifically responsible for handling Microsoft Word Document
- WordExtractor will extract all the words from the word document
- getParagraphText() will extract all the text paragraph wise
- Finally we try to read the paragraph content
Custom Search
















Hi,
I am getting below exception while running this example.
Could you please let me know if I am missing any jars/ need to do anything else to execute this java class.
Thanks in advance for your help.
Regards,
Subramanyam.
Hi,
sorry for the spam. attaching exception.
I am getting below exception while running this example.
java.io.IOException: Invalid header signature; read 7021802808062469458, expected -2226271756974174256
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:112)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:151)
at com.general.test.ReadDoc.main(ReadDoc.java:16)
Could you please let me know if I am missing any jars/ need to do anything else to execute this java class.
Thanks in advance for your help.
Regards,
Subramanyam.
After run this code i got below exception. Please give me any solution for this execption. I already insert jar also, but still i got this execption. One thing I didn’t get this EncryptedDocumentException.class in the jar.
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/poi/EncryptedDocumentException
at ws.WordRead.main(WordRead.java:38)
ERROR: JDWP Unable to get JNI 1.2 environment, jvm->GetEnv() return code = -2
JDWP exit error AGENT_ERROR_NO_JNI_ENV(183): [../../../src/share/back/util.c:820]
Hi Nishikanta,
I have uses POI-3.0.2-Final.jar and poi-scratchpad-3.0.2-FINAL-20080204.jar package for this code.
after running this code excption “java.io.FileNotFoundException: hello.doc (The system cannot find the file specified) ” was genereted
so where do i must place hello.doc (i created it on my desktop) thankss
Hi Slim,
Just place the hello.doc where .class file resides. If you are putting the doc file at another location than specify the location path in the source code. IT will work fine.
Thanks,
Hitesh Agrawal
hi,
thanks for the answer.
the script work very well.
what’s the effect of using “paragraphs[i] = paragraphs[i].replaceAll(“\\cM?\r?\n”,”");”
thanks
Hi,
Thanks for this post, it’s very useful.
I’m trying to find a word on my word file after reading the file.
How can i do it??
Thanks a lot
java.io.IOException: Unable to read entire header; 6 bytes read; expected 512 bytes
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:78)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:83)
at org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocument.java:133)
at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:146)
at transactionDB.changeFormat.main(changeFormat.java:45)
Error display what i have to do tell me please
Hello hitesh,
thanks for sharing this example. I have a different requirement with word file. I want to add an image into word document using POI, but don’t know how to do this.
Thanks,
Ankur Raiyani
How do I read word comments and bookmarks using Java? Do u have a sample code? Any help would be appreciated.
hi friends,
Can anyone help me in this………i had use this code and i m geting this exceptions……i am using poi-2.5.1-final-20040804.jar.and poi-scratchpad-3.5-beta5-20090219.jar files……..how to specify the location path in source code…..i had kept the file in desktop
java.io.IOException: Invalid header signature; read 85966670672, expected -2226271756974174256
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:88)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:83)
at rb.action.FileRead.main(FileRead.java:15)
Sathish Raja,
Have you fixed the issue, if fixed please post the steps
Hi Hitesh,
where do I store the POI-3.0.2-Final.jar and poi-scratchpad-3.0.2-FINAL-20080204.jar files. I am just trying to get the example above working. Cheers for the help.
Darren
Hi friends,
On executing this code am getting the following error.can anyone tell me how to resolve this problem.
java.io.IOException: Unable to read entire header; -1 bytes read; expected 512 bytes
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:78)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:83)
hello! im really lost… i am very new to this poi but i have to use this for my project which is to read a word doc using java… how can i “import” the package for org.apache.poi? i have downloaded the poi-3.5-beta6 and it asked me to install ant and forrest.. it asked me to set the environment variable to ANT_HOME and FORREST_HOME.. please help me.. im confused!
Hi friends,
I am trying to change the font size of a text.
To do this I am writing one HWPF stream to another and hence can change the font, but what I exactly need is to have different font(and/or size) for each word/paragraph. Basically to have more than one font size in a single piece of word file.
Can anybody please tell me how to go about doing this ??
what I exactly need is …
dgd gedgfe
rbr brbr gbntghth
rghh rtfhtyh bnfgh
that is each word having different font properties
getting error:
java.lang.NoClassDefFoundError: org/apache/poi/hpsf/WritingNotSupportedException
hi,
I have executed ur java program to read word document. it works fine , but if the word document hava a Tables. your code produce a malicious script and code runs infinte loop.
please tell me is there any methods to read a data from a tables in word Document.
@Ankur Raiyani
Did you have any luck getting apache POI to insert images into a word document. I am trying to do the same thing.
Thank you very much.
plzzzz quickly i need help : i use 2 files .file with header and file without header when i enter the file that without header give me this error java.io.IOException: Invalid header signature; read 0×665C316674725C7B, expected 0xE11AB1A1E011CFD0
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:107)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:151)
at wordtotext.Main.main(Main.java:30)
and the second file run good plz helpme