Custom Search

Reading Microsoft Word Document in JAVA

When it comes to reading Microsoft Office Word document Java does not have any in build classes to handle this but Apache POI Package developed by Apache Foundation gives you the power of reading Microsoft Word document in Java. More information on the Apache POI package can be found at Apache POI




import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
import java.io.*;
 
public class readDoc
{
	public static void main( String[] args )
	{
		String filesname = "Hello.doc";
		POIFSFileSystem fs = null;
		try
		{
                  fs = new POIFSFileSystem(new FileInputStream(filesname; 
                  //Couldn't close the braces at the end as my site did not allow it to close
 
                  HWPFDocument doc = new HWPFDocument(fs);
 
		  WordExtractor we = new WordExtractor(doc);
 
		  String[] paragraphs = we.getParagraphText();
 
		  System.out.println( "Word Document has " + paragraphs.length + " paragraphs" );
		  for( int i=0; i<paragraphs .length; i++ ) {
			paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n","");
                	System.out.println( "Length:"+paragraphs[ i ].length());
		  }
                }
                catch(Exception e) { 
                    e.printStackTrace();
                }
         }
}

Code Explanation:

Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to my feed and get articles like this delivered automatically to your feed reader.

Comments

I use a JodConverter and OpenOffice.org tools to manipulate it.

Much easier and much more flexible, IMHO.

Typo :(

I meant OdtConverter. It’s not that much easier if you just need reading the file, but it’s great for modifications.

Leave a comment

(required)

(required)