Reading Microsoft Word Document in JAVA
When it comes to reading Microsoft Office Word document Java does not have any in build classes to handle this but Apache POI Package developed by Apache Foundation gives you the power of reading Microsoft Word document in Java. More information on the Apache POI package can be found at Apache POI
import org.apache.poi.poifs.filesystem.*; import org.apache.poi.hwpf.*; import org.apache.poi.hwpf.extractor.*; import java.io.*; public class readDoc { public static void main( String[] args ) { String filesname = "Hello.doc"; POIFSFileSystem fs = null; try { fs = new POIFSFileSystem(new FileInputStream(filesname; //Couldn't close the braces at the end as my site did not allow it to close HWPFDocument doc = new HWPFDocument(fs); WordExtractor we = new WordExtractor(doc); String[] paragraphs = we.getParagraphText(); System.out.println( "Word Document has " + paragraphs.length + " paragraphs" ); for( int i=0; i<paragraphs .length; i++ ) { paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n",""); System.out.println( "Length:"+paragraphs[ i ].length()); } } catch(Exception e) { e.printStackTrace(); } } }
Code Explanation:
- Creating new POIFSFileSystem Object and passing the Microsoft Word document to it
- Creating new object of HWPFDocument class, this class is specifically responsible for handling Microsoft Word Document
- WordExtractor will extract all the words from the word document
- getParagraphText() will extract all the text paragraph wise
- Finally we try to read the paragraph content
Custom Search
Popular Articles:
- Invoking Class Methods using Reflection in Java
- HTTP POST File Content in JAVA
- Sending Exceptions Email Using Apache Log4J
- Singleton Design Pattern in Java
- Log4J Logging Inside Eclipse Console
- JSON in JAVA
- UTF-8 Encoding Email Content using Java
- Modifying / Editing XML Document in JAVA
- Programmatically logging using Apache Log4J
- Remote URL Connection Through Proxy in Java



































Hi,
I am getting below exception while running this example.
Could you please let me know if I am missing any jars/ need to do anything else to execute this java class.
Thanks in advance for your help.
Regards,
Subramanyam.
Hi,
sorry for the spam. attaching exception.
I am getting below exception while running this example.
java.io.IOException: Invalid header signature; read 7021802808062469458, expected -2226271756974174256
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:112)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:151)
at com.general.test.ReadDoc.main(ReadDoc.java:16)
Could you please let me know if I am missing any jars/ need to do anything else to execute this java class.
Thanks in advance for your help.
Regards,
Subramanyam.
After run this code i got below exception. Please give me any solution for this execption. I already insert jar also, but still i got this execption. One thing I didn’t get this EncryptedDocumentException.class in the jar.
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/poi/EncryptedDocumentException
at ws.WordRead.main(WordRead.java:38)
ERROR: JDWP Unable to get JNI 1.2 environment, jvm->GetEnv() return code = -2
JDWP exit error AGENT_ERROR_NO_JNI_ENV(183): [../../../src/share/back/util.c:820]
Hi Nishikanta,
I have uses POI-3.0.2-Final.jar and poi-scratchpad-3.0.2-FINAL-20080204.jar package for this code.
after running this code excption “java.io.FileNotFoundException: hello.doc (The system cannot find the file specified) ” was genereted
so where do i must place hello.doc (i created it on my desktop) thankss
Hi Slim,
Just place the hello.doc where .class file resides. If you are putting the doc file at another location than specify the location path in the source code. IT will work fine.
Thanks,
Hitesh Agrawal
hi,
thanks for the answer.
the script work very well.
what’s the effect of using “paragraphs[i] = paragraphs[i].replaceAll(“\\cM?\r?\n”,”");”
thanks
Hi,
Thanks for this post, it’s very useful.
I’m trying to find a word on my word file after reading the file.
How can i do it??
Thanks a lot
java.io.IOException: Unable to read entire header; 6 bytes read; expected 512 bytes
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:78)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:83)
at org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocument.java:133)
at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:146)
at transactionDB.changeFormat.main(changeFormat.java:45)
Error display what i have to do tell me please
Hello hitesh,
thanks for sharing this example. I have a different requirement with word file. I want to add an image into word document using POI, but don’t know how to do this.
Thanks,
Ankur Raiyani
How do I read word comments and bookmarks using Java? Do u have a sample code? Any help would be appreciated.
hi friends,
Can anyone help me in this………i had use this code and i m geting this exceptions……i am using poi-2.5.1-final-20040804.jar.and poi-scratchpad-3.5-beta5-20090219.jar files……..how to specify the location path in source code…..i had kept the file in desktop
java.io.IOException: Invalid header signature; read 85966670672, expected -2226271756974174256
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:88)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:83)
at rb.action.FileRead.main(FileRead.java:15)
Sathish Raja,
Have you fixed the issue, if fixed please post the steps
Hi Hitesh,
where do I store the POI-3.0.2-Final.jar and poi-scratchpad-3.0.2-FINAL-20080204.jar files. I am just trying to get the example above working. Cheers for the help.
Darren
Hi friends,
On executing this code am getting the following error.can anyone tell me how to resolve this problem.
java.io.IOException: Unable to read entire header; -1 bytes read; expected 512 bytes
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:78)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:83)
hello! im really lost… i am very new to this poi but i have to use this for my project which is to read a word doc using java… how can i “import” the package for org.apache.poi? i have downloaded the poi-3.5-beta6 and it asked me to install ant and forrest.. it asked me to set the environment variable to ANT_HOME and FORREST_HOME.. please help me.. im confused!
Hi friends,
I am trying to change the font size of a text.
To do this I am writing one HWPF stream to another and hence can change the font, but what I exactly need is to have different font(and/or size) for each word/paragraph. Basically to have more than one font size in a single piece of word file.
Can anybody please tell me how to go about doing this ??
what I exactly need is …
dgd gedgfe
rbr brbr gbntghth
rghh rtfhtyh bnfgh
that is each word having different font properties
getting error:
java.lang.NoClassDefFoundError: org/apache/poi/hpsf/WritingNotSupportedException
hi,
I have executed ur java program to read word document. it works fine , but if the word document hava a Tables. your code produce a malicious script and code runs infinte loop.
please tell me is there any methods to read a data from a tables in word Document.
@Ankur Raiyani
Did you have any luck getting apache POI to insert images into a word document. I am trying to do the same thing.
Thank you very much.
plzzzz quickly i need help : i use 2 files .file with header and file without header when i enter the file that without header give me this error java.io.IOException: Invalid header signature; read 0x665C316674725C7B, expected 0xE11AB1A1E011CFD0
at org.apache.poi.poifs.storage.HeaderBlockReader.(HeaderBlockReader.java:107)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:151)
at wordtotext.Main.main(Main.java:30)
and the second file run good plz helpme
Thank u 4 d code
System.out.println(paragraphs[ i ].toString()); // to print the paragraphs
please anyone can provide me with the java code through which i can insert image into a MS word file at any location,and also consider that it has some caontent on it.plz reply me..
please let me know how to insert image into a word doc file
please let me know how can we read images of .doc file along with text using java
Excellent.
Thank very much.
I am begineer o java.When I compile this example I got 9 errors.
Help me please…
package org.apache.poi.poifs.filesystem does not exist
import org.apache.poi.poifs.filesystem.*;
package org.apache.poi.hwpf does not exist
import org.apache.poi.hwpf.*;
package org.apache.poi.hwpf.extractor does not exist
import org.apache.poi.hwpf.extractor.*;
cannot find symbol
symbol : class POIFSFileSystem
location: class readDoc
POIFSFileSystem fs = null;
cannot find symbol
symbol : class POIFSFileSystem
location: class readDoc
fs = new POIFSFileSystem(new FileInputStream(filesname));
cannot find symbol
symbol : class HWPFDocument
location: class readDoc
HWPFDocument doc = new HWPFDocument(fs);
cannot find symbol
symbol : class HWPFDocument
location: class readDoc
HWPFDocument doc = new HWPFDocument(fs);
cannot find symbol
symbol : class WordExtractor
location: class readDoc
WordExtractor we = new WordExtractor(doc);
cannot find symbol
symbol : class WordExtractor
location: class readDoc
WordExtractor we = new WordExtractor(doc);
9 errors
Hi UJJAL,
You will have to add Apache POI libraries in your class path to make it work. You can download the Apache POI packages from http://poi.apache.org/ and also you are trying to read microsoft word documents in java than you will also require this libraries as well. http://poi.apache.org/hwpf/index.html
Thanks,
Hitesh Agarwal
Please anyone help me…
Let me know about the basic job of mine to read from a document..
Very nice information.
Is it possible to edit .doc and/or .docx documents with POI? I’d like to be able to replace certain text fragments in several Word documents and then save updated documents to disk.
This code read a .doc file paragraph by paragraph.
How can I read this file sentence by sentence?
Thanks in advance.
How can I read doc with text and images ?
and how I can read text with style ?
Hi,
How to replace one string for another in .doc documents?
I think there are a lot of serious bugs in the implementation of HWPF format, e.g. the following:
HWPFDocument doc = new HWPFDocument(inputStream);
doc.write(outputStream);
turns .doc files into somethig that cannot be opened with Word anymore.
Hitesh,
Thanks for this. Excellent post..saved me a ton of searching.
How identify the heading of the .doc file….
please…
send me the code…
How do identify the heading of the .doc file…. using apache POI
please…
send me the code…
Hi
Can you please tell me how to read a doc file that have Images with it.
Post some code if possible..
@Subramanyam
hi
i m want to read a doc file using poi interface but i m geeting an error on package and word extractor plz help me
thank you in advance
Hi
I just want to load some information from database table and load that data into a word document. Finally I want to create a way to load database’s data into a word document on a single button click in my java application. Thank you
Need help as soon as possible…
Thanks and best regards,
Shehan.