Monday, January 25, 2010

Microsoft Word Document Automation using XSL and XML

Here is My first technical blog!!!

First of all, what is document automation?

In a simple term generating word document without the use of Microsoft Office.

Why its required?

Although Microsoft Office is easy to use, and it is very user friendly, many find it difficult to use(I am one of them). At least when the document involves complex formatting.

Consider the example of Invoice. It contains customer information, product information, date, company information, etc. Each Invoice has some common things like same design, company information. Only date, customer name, product information will be different. Now consider a business in which thousands of invoices will be generated during the whole day. Each time if a user has to create new word document, enter data then it will be a time consuming task for user.

Here document automation comes in to picture. What if a user has a software in which user has to enter only data like customer name, product information and document will be generated for the user. It's very easy for user. There are many examples in a real world where document automation is very useful. Basically, we can use document automation where format of document does not change only content to document changes.

Background

I have worked on document automation projects. Here a requirement was to automate course plan and MOM(Minutes of Meeting) document. Course plan document contains all the course information like the title of course, pre requisites for the course, session details, objective of the session, points to be covered in session. MOM document contains meeting information like who are the invites, time of meeting, agenda of meeting. These documents contain complex formatting.

A first step was to create application so that user can enter data. I have created Asp.Net application, where a user can enter data through GUI. Data will be stored in Sql server so that anytime user can re generate a document. Main part of the projects was to create word document from data that user has entered and stored in a sql server. I have decided to use XML and XSL.

Introduction to XML

XML stands for Extensible Markup Language. It is a markup language same is HTML. XML is designed for data communication. There are no predefined tags like HTML in XML. User can define own tags. It's a self descriptive. What does it mean by self descriptive? Author of XML document can specify own tags.

Consider following example of XML document

<companyinfomation>
<companyname>ABC Pvt. Ltd.</companyname>
<companyaddress>XYZ Road,Andheri,Mumbai</companyaddress>
<numberofemployees>20</numberofemployees>
</companyinfomation>


From above XML anyone can get that it is the company information of ABC Pvt. Ltd. which is in mumbai and it has 20 employees.

XML simplifies data sharing. Nowadays, a days many RDBMS systems are available. Data sharing is difficult because each RDBMS has its own format to store data like Sql server uses .mdf file. Using XML data storing will be independent. It also makes data transportation easy. Now a days XML are used at many places like RSS(my next topic),WSDL,Open office XML, etc.

Introduction to XSL

XSL stands for Extensible Stylesheet Language. XSLT stands for XSL Transformations. XSLT is used to convert XML documents to other formats like HTML, word, etc.
There are three major parts of XSL.

1) XSLT- to transform XML document,
2) XPath - to navigate through XML
3) XSL-FO- to format XML document

XSLT is more important for us. Xslt is used to transform XML document to another type of document like HTML. It is supported by all major browsers.

Consider CSS( cascading style sheet) is used to transform HTML document. It is used to give a look and feel to HTML tags. Consider below example.


span
{
background-color:red;
}

Above example gives red back ground color to each span tag. Same way XSLT is used for XML document. Consider above company information XML. If we directly view this XML file in a browser, It does not look good. We can use XSLT to transform this XML document. Like If I want to display company name in bold in align it to center. We can use following XSLT code.


<xsl:template match="CompanyName">
<tr align="center">
<th>
<xsl:apply-templates/>
</th>
</tr>
</xsl:template>

It will display CompanyName in bold and align it to center. We can add reference of XSL file in XML file as follows.


Some useful XSL templates

1) <xsl:template>

It is used to build XSL template. It has many attributes. Here we can specify XML element like in an above example, we have specified "CompanyName". match= "\" specifies root tag.

2) <xsl:apply-templates>

It is used to apply formatting. Like in above case we are applying formatting of table row and cell.

3) <xsl:value-of>

It is used to get value of XML element and use it in transformation.

4) <xsl:for-each>

It is used to loop through XML elements.

5) <xsl:if>

It is used to specify a certain conditions while transforming.

There are many more templates are available. For more information, you can visit w3schools website.

So in my project I have written XSL file with XSL Transfomation to get necessary formatting. I have created XML file from data stored in the sql server.

There is one class available in. Net framework 2.0 in System.Xml.Xsl name space, that is XslCompiledTransform. Using it, you can transform XML document as follows.


Dim xslt As New XslCompiledTransform
xslt.Load(Server.MapPath("~\") & "exp.xsl")
xslt.Transform(Server.MapPath("~\") & "testing.xml", Server.MapPath("~\") & "testing.html")
xslt.Transform(Server.MapPath("~\") & "testing.xml", Server.MapPath("~\") & "testing.doc")


Load method will load xsl style sheet. Transform method takes two arguments. First is XML file to be transformed and Output file.

This is how I have implemented word document automation. There is a limitation of it. We cannot create an Office 2007 word file that is .docx file. Because in Office 2007 Microsoft has introduced a totally new format that is Open Office XML. That I will discuss in my future blogs.

3 comments:

  1. Hello Sir, Actually I am trying to learn new kind of architecture of documentation i.e. DITA - Darwin Information Typing Architecture. It is totally based on XML. Can you help me learn it??

    Thanks in Advance...

    ReplyDelete
  2. Some days ago I was typing in my working word file and something happened. Luckily for me I by accident found out - word recover, which in my opinion be good at shows helpful capacities.

    ReplyDelete
  3. how can i get common xsl file for all word documents....

    so for this I can create a valid XSLFO file ...and then i can make it to pdf file.....??????

    plz help ...how to create a xsl file for all word document in a xml format.....

    ReplyDelete