Reading Microsoft Word Document in JAVA
Add a comment
May 9th, 2008
When it comes to reading Microsoft Office Word document Java does not have any in build classes to handle this but Apache POI Package developed by Apache Foundation gives you the power of reading Microsoft Word document in Java. More information on the Apache POI package can be found at Apache POI
import org.apache.poi.poifs.filesystem.*; import org.apache.poi.hwpf.*; import org.apache.poi.hwpf.extractor.*; import java.io.*; public class readDoc { public static void main( String[] args ) { String filesname = "Hello.doc"; POIFSFileSystem fs = null; try { fs = new POIFSFileSystem(new FileInputStream(filesname; //Couldn't close the braces at the end as my site did not allow it to close HWPFDocument doc = new HWPFDocument(fs); WordExtractor we = new WordExtractor(doc); String[] paragraphs = we.getParagraphText(); System.out.println( "Word Document has " + paragraphs.length + " paragraphs" ); for( int i=0; i<paragraphs .length; i++ ) { paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n",""); System.out.println( "Length:"+paragraphs[ i ].length()); } } catch(Exception e) { e.printStackTrace(); } } }
Code Explanation:
- Creating new POIFSFileSystem Object and passing the Microsoft Word document to it
- Creating new object of HWPFDocument class, this class is specifically responsible for handling Microsoft Word Document
- WordExtractor will extract all the words from the word document
- getParagraphText() will extract all the text paragraph wise
- Finally we try to read the paragraph content
Popular Articles:
- Ajax Programming with JSP and Servlets
- Tracking User Session in Apache log4j
- Reading Microsoft Word Document in JAVA
- Invoking Class Methods using Reflection in Java
- Date Manipulation in JAVA
Subscribe to my RSS feed.















Hi,
I am getting below exception while running this example.
Could you please let me know if I am missing any jars/ need to do anything else to execute this java class.
Thanks in advance for your help.
Regards,
Subramanyam.
Hi,
sorry for the spam. attaching exception.
I am getting below exception while running this example.
java.io.IOException: Invalid header signature; read 7021802808062469458, expected -2226271756974174256
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:112)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:151)
at com.general.test.ReadDoc.main(ReadDoc.java:16)
Could you please let me know if I am missing any jars/ need to do anything else to execute this java class.
Thanks in advance for your help.
Regards,
Subramanyam.
After run this code i got below exception. Please give me any solution for this execption. I already insert jar also, but still i got this execption. One thing I didn’t get this EncryptedDocumentException.class in the jar.
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/poi/EncryptedDocumentException
at ws.WordRead.main(WordRead.java:38)
ERROR: JDWP Unable to get JNI 1.2 environment, jvm->GetEnv() return code = -2
JDWP exit error AGENT_ERROR_NO_JNI_ENV(183): [../../../src/share/back/util.c:820]
Hi Nishikanta,
I have uses POI-3.0.2-Final.jar and poi-scratchpad-3.0.2-FINAL-20080204.jar package for this code.
after running this code excption “java.io.FileNotFoundException: hello.doc (The system cannot find the file specified) ” was genereted
so where do i must place hello.doc (i created it on my desktop) thankss
Hi Slim,
Just place the hello.doc where .class file resides. If you are putting the doc file at another location than specify the location path in the source code. IT will work fine.
Thanks,
Hitesh Agrawal
hi,
thanks for the answer.
the script work very well.
what’s the effect of using “paragraphs[i] = paragraphs[i].replaceAll(”\\cM?\r?\n”,”");”
thanks
Hi,
Thanks for this post, it’s very useful.
I’m trying to find a word on my word file after reading the file.
How can i do it??
Thanks a lot
java.io.IOException: Unable to read entire header; 6 bytes read; expected 512 bytes
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:78)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:83)
at org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocument.java:133)
at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:146)
at transactionDB.changeFormat.main(changeFormat.java:45)
Error display what i have to do tell me please
Hello hitesh,
thanks for sharing this example. I have a different requirement with word file. I want to add an image into word document using POI, but don’t know how to do this.
Thanks,
Ankur Raiyani
How do I read word comments and bookmarks using Java? Do u have a sample code? Any help would be appreciated.