?

Log in

   Journal    Friends    Archive    Profile    Memories
 

Lightweight solution to the mshtml interop DLL distribution - David Airapetyan on C# and .NET

Jun. 7th, 2007 10:04 am Lightweight solution to the mshtml interop DLL distribution

When accessing the Internet Explorer's object model, we often use the mshtml interop assembly called Microsoft.mshtml.dll which comes with an installation of Visual Studio. However there is a common problem – when redistributing the application written using Microsoft.mshtml.dll, the entire 8MB DLL has to be included with your app because otherwise it will fail to execute on machines that do not have VS installed.

If you do not wish to redistribute the entire interop DLL there is another approach.  After all, the entire point of the interop assembly is to instruct .NET how to invoke into unmanaged COM objects so there is no need to include the entire assembly since there will be many interfaces/methods we will not be using!

Here are the steps to move from using mshtml into using a leaner version of it:

1.       Start by writing your code using the mshtml interop assembly

2.       Under debugger, inspect the runtime classes of mshtml object

3.       Using reflection, find out the dispatch definitions of those classes

4.       Create a dispatch interface for each of those, removing all properties/methods you do not need (I use the miniMshtml namespace for those)

5.       Replace the mshtml casts by your miniMshtml ones

Here is a walkthrough of a typical example. In this example, we search for all SPAN elements with non-empty title attributes and set their inner text:

IHTMLDocument3 htmlDocument = (IHTMLDocument3)documentObject;
foreach
(IHTMLElement element in htmlDocument.getElementsByTagName("span"))
{
  if
(!String.IsNullOrEmpty((string)element.getAttribute("title", 0)))
  {
    element.innerText = "Test";
  }
}

 
In the debugger, step through the code and find out the runtime types for htmlDocument and element. It turns out those types are:

HTMLDocumentClass
HTMLSpanElementClass

For each of those classes, find out their corresponding dispatch interfaces. The easiest way to do it is by using built-in reflection in Visual Studio 2005 – simply type HtmlDocumentClass and hit F12 with the cursor positioned on it. When you do this, VS will generate the class definition looking something like that:

public class HTMLDocumentClass : DispHTMLDocument, HTMLDocument, HTMLDocumentEvents_Event, IHTMLDocument2, IHTMLDocument3...

What you are interested in is DispHTMLDocument. Hit F12 on it again and you will get the dispatch interface definition:

[InterfaceType(2)]
[Guid("3050F55F-98B5-11CF-BB82-00AA00BDCE0B")]
[TypeLibType(4112)]

public interface DispHTMLDocument
{
  [DispId(1005)]
  IHTMLElement
activeElement { get; }
  [DispId(1022)]
  object
alinkColor { get; set; }
  [DispId(1003)]
  IHTMLElementCollection
all { get; }

 
...

}

 
Now remove all methods that you are not interested in. In our case, we only want the getElementsByTagName method:

[InterfaceType(2)]
[Guid("3050F55F-98B5-11CF-BB82-00AA00BDCE0B")]
[TypeLibType(4112)]
public
interface DispHTMLDocument
{
  [DispId(1087)]
  IEnumerable
getElementsByTagName(string v);
}

 
Notice a trick: in the original interface, getElementsByTagName returns IHTMLElementCollection  but defining it does not buy you anything because it simply implements IEnumerable. This enumeration is all we care about so we change the return type.

 
By doing the same thing for HTMLSpanElementClass we end up with a trimmed version of  DispHTMLSpanElement as well:

[InterfaceType(2)]
[Guid("3050F548-98B5-11CF-BB82-00AA00BDCE0B")]
[TypeLibType(4112)]
public
interface DispHTMLSpanElement
{
  [DispId(-2147417085)]
  string
innerText { get; set; }
  [DispId(-2147417610)]
  object
getAttribute(string strAttributeName, int lFlags);
}

 
Finally, we can replace the original mshtml implementation with our miniMshtml:

DispHTMLDocument htmlDocument = (DispHTMLDocument)documentObject;

foreach (DispHTMLSpanElement element in       
   htmlDocument.getElementsByTagName("span"))
{
  if
(!String.IsNullOrEmpty((string)element.getAttribute("title", 0)))
  {
     element.innerText = "Test";
  }
}

Now we can remove the reference to Microsoft.mshtml assembly and we're good to go!

3 comments - Leave a comment Share Next Entry

Comments:

From:(Anonymous)
Date:December 28th, 2007 07:42 am (UTC)

I would be glad to receive a sample project of this

(Link)
Can you post a sample project. Thanks
From:csharpcoder
Date:December 29th, 2007 12:37 am (UTC)

Re: I would be glad to receive a sample project of this

(Link)
Sure, why not...

http://www.davidair.com/misc/DumpCellText.zip

This project hosts an embedded instance of IE and allows dumping the contexts of TD elements for a user-specified page.
From:(Anonymous)
Date:February 28th, 2010 10:33 pm (UTC)

Thanks!

(Link)
This saved me from distributing the mshtml primary interop assembly (PIA) or to use the vs90_piaredist.exe on the clients.