List the unique words in a Microsoft Word file in C#

The example Use regular expressions and LINQ to list the unique words contained in a text file in C# shows how to list the unique the words in a text file. This example shows how to list the unique words in a Microsoft Word file.

Before you start, add a reference to the Microsoft Word 12.0 Object Library (or whatever version you have installed on your system). Then add the following using statement to make working with the Word namespace easier. The "Word =" part means you can use "Word" as an alias for the namespace.

using Word = Microsoft.Office.Interop.Word;

The following code shows how the program gets the words from a Word file.

// Read the text contents of a Word file.
private string GrabWordFileWords(string file_name)
{
// Get the Word application object.
Word._Application word_app = new Word.ApplicationClass();

// Make Word visible (optional).
word_app.Visible = false;

// Open the file.
object filename = file_name;
object confirm_conversions = false;
object read_only = true;
object add_to_recent_files = false;
object format = 0;
object missing = System.Reflection.Missing.Value;

Word._Document word_doc =
word_app.Documents.Open(ref filename, ref confirm_conversions,
ref read_only, ref add_to_recent_files,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref format, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing);

// Return the document's text.
string result = word_doc.Content.Text;

// Close the document without prompting.
object save_changes = false;
word_doc.Close(ref save_changes, ref missing, ref missing);
word_app.Quit(ref save_changes, ref missing, ref missing);

// Return the result.
return result;
}

The code first creates a Word application server. It sets the server's Visible property to false so it doesn't appear but you can change that if you like.

Next the program opens the desired Word document. It then uses the document's Content.Text property to get the file's text.

The method finishes by closing the file and Word server, and returning the file's text. The rest of the code is similar to the code used by the previous example to process text files. See that example for details.

   

 

What did you think of this article?




Trackbacks
  • No trackbacks exist for this post.
Comments
  • No comments exist for this post.
Leave a comment

Submitted comments are subject to moderation before being displayed.

 Name

 Email (will not be published)

 Website

Your comment is 0 characters limited to 3000 characters.