Reading Microsoft Word Document in JAVA
Check out latest Mobile Phones and Books only at Flipkart.com
When it comes to reading Microsoft Office Word document Java does not have any in build classes to handle this but Apache POI Package developed by Apache Foundation gives you the power of reading Microsoft Word document in Java. More information on the Apache POI package can be found at Apache POI
import org.apache.poi.poifs.filesystem.*; import org.apache.poi.hwpf.*; import org.apache.poi.hwpf.extractor.*; import java.io.*; public class readDoc { public static void main( String[] args ) { String filesname = "Hello.doc"; POIFSFileSystem fs = null; try { fs = new POIFSFileSystem(new FileInputStream(filesname; //Couldn't close the braces at the end as my site did not allow it to close HWPFDocument doc = new HWPFDocument(fs); WordExtractor we = new WordExtractor(doc); String[] paragraphs = we.getParagraphText(); System.out.println( "Word Document has " + paragraphs.length + " paragraphs" ); for( int i=0; i<paragraphs .length; i++ ) { paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n",""); System.out.println( "Length:"+paragraphs[ i ].length()); } } catch(Exception e) { e.printStackTrace(); } } }
Code Explanation:
- Creating new POIFSFileSystem Object and passing the Microsoft Word document to it
- Creating new object of HWPFDocument class, this class is specifically responsible for handling Microsoft Word Document
- WordExtractor will extract all the words from the word document
- getParagraphText() will extract all the text paragraph wise
- Finally we try to read the paragraph content
Related Articles:
- Date Manipulation in JAVA
- Reading IMAP Server Emails Using Java
- MySql Batch Insert/Update in Java
- Reading POP3 Mails Using Java
- Ajax Programming with JSP and Servlets
- Performing Text To Speech (TTS) conversion on linux using Java
- Java Plugin detection using JavaScript
- Publish on Facebook WALL using Java
- HTTP POST File Content in JAVA
- Factory Design Pattern in Java
Is there any way to read table embedded inside a word document.. I can’t get the text from the Doc. But there is no way of determining the text cell location. Is there class in POI that can help me out with the table Extraction
Got the answer
If anyone need see
http://www.coderanch.com/t/473792/open-source/Reading-text-table-word-document
How to read a particular page in a word and pdf document using java? Pls reply soon…………..
Really useful stuff. thanks. can u please post how to find a underlined string from the content read from word document. thanks in advance
Thanks alot…It helped me…
HI,
Dear I want to read a MS Word document which can have different headings and images as well and then I want to encode all that read stuff and write on a file with any extension using java/android.
If anyone of you have any solution then please share with me as I can get help from it.
Regards;
I like it……………..
HI
I want to read and write Image with contents in word document in java,
I write the program its only read file contents not image,
If anyone of you have any solution then please share with me as I can get help from it.
I need the same example too…Plz give it to me if u have got it…