Reading Microsoft Word Document in JAVA  

Check out latest Mobile Phones and Books only at Flipkart.com
Flipkart.com

When it comes to reading Microsoft Office Word document Java does not have any in build classes to handle this but Apache POI Package developed by Apache Foundation gives you the power of reading Microsoft Word document in Java. More information on the Apache POI package can be found at Apache POI

import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
import java.io.*;
 
public class readDoc
{
	public static void main( String[] args )
	{
		String filesname = "Hello.doc";
		POIFSFileSystem fs = null;
		try
		{
                  fs = new POIFSFileSystem(new FileInputStream(filesname;
                  //Couldn't close the braces at the end as my site did not allow it to close
 
                  HWPFDocument doc = new HWPFDocument(fs);
 
		  WordExtractor we = new WordExtractor(doc);
 
		  String[] paragraphs = we.getParagraphText();
 
		  System.out.println( "Word Document has " + paragraphs.length + " paragraphs" );
		  for( int i=0; i<paragraphs .length; i++ ) {
			paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n","");
                	System.out.println( "Length:"+paragraphs[ i ].length());
		  }
                }
                catch(Exception e) {
                    e.printStackTrace();
                }
         }
}


 


Code Explanation:

  • Creating new POIFSFileSystem Object and passing the Microsoft Word document to it
  • Creating new object of HWPFDocument class, this class is specifically responsible for handling Microsoft Word Document
  • WordExtractor will extract all the words from the word document
  • getParagraphText() will extract all the text paragraph wise
  • Finally we try to read the paragraph content


Related Articles:

Categories: Java Tags:
  1. G_Skya
    February 3rd, 2011 at 22:23 | #1

    Is there any way to read table embedded inside a word document.. I can’t get the text from the Doc. But there is no way of determining the text cell location. Is there class in POI that can help me out with the table Extraction

  2. G_Skya
    February 4th, 2011 at 00:48 | #2
  3. Maneesh
    February 4th, 2011 at 01:54 | #3

    How to read a particular page in a word and pdf document using java? Pls reply soon…………..

  4. naresh
    February 10th, 2011 at 02:30 | #4

    Really useful stuff. thanks. can u please post how to find a underlined string from the content read from word document. thanks in advance

  5. Ammu
    July 25th, 2011 at 05:28 | #5

    Thanks alot…It helped me…

  6. Aijaz Ali
    January 4th, 2012 at 07:13 | #6

    HI,
    Dear I want to read a MS Word document which can have different headings and images as well and then I want to encode all that read stuff and write on a file with any extension using java/android.
    If anyone of you have any solution then please share with me as I can get help from it.
    Regards;

  7. Md.mahidur
    March 1st, 2012 at 21:49 | #7

    I like it……………..

  8. magesh
    March 2nd, 2012 at 00:55 | #8

    HI
    I want to read and write Image with contents in word document in java,

    I write the program its only read file contents not image,
    If anyone of you have any solution then please share with me as I can get help from it.

  9. vishal
    March 14th, 2012 at 06:56 | #9

    Aijaz Ali :HI, Dear I want to read a MS Word document which can have different headings and images as well and then I want to encode all that read stuff and write on a file with any extension using java/android. If anyone of you have any solution then please share with me as I can get help from it. Regards;

    I need the same example too…Plz give it to me if u have got it…

Comment pages
1 2 121
  1. January 5th, 2011 at 16:45 | #1

 

Page optimized by WP Minify WordPress Plugin