# grep the openoffice writer file



## jotawski (Feb 16, 2011)

Hi,

I want to make a sed/grep combination on open office template file to change just a few key words.  But when I did this`grep -a TIME work1.doc`  there is nothing happened.  Even withou -a flags, the result is still the same.

work1.doc is a Thai language openoffice template file for writer.

Any helps and hints are welcome.


----------



## Fred (Feb 16, 2011)

An openoffice file is actually just a ZIP archive. You will thus need to extract it, edit the content (probably content.xml), and rebuild the ZIP archive.

*tar -xf* can extract a ZIP archive, but cannot rebuild it ; you will need something like *archivers/zip*.


----------



## jotawski (Feb 16, 2011)

Fred said:
			
		

> An openoffice file is actually just a ZIP archive. You will thus need to extract it, edit the content (probably content.xml), and rebuild the ZIP archive.
> 
> *tar -xf* can extract a ZIP archive, but cannot rebuild it ; you will need something like *archivers/zip*.



Would you please demonstrate, I can not find any xml file in .doc file.


----------



## rambetter (Feb 16, 2011)

What Fred seems to think is that your document is a ZIP file, so you would do:


```
unzip work1.doc
```

And that would result in some files being extracted from your document.

However, I don't personally believe that the .doc is a ZIP file.

Generally, .doc files are binary files, meaning the formatting and text are encoded in some kind of binary code that is not generally known to the public.  The .doc is not a plain text file, and so you cannot grep or replace text in it from the command line.  It may be possible to save your document to some kind of XML in which case such greps and replacements may in fact be possible.

If you want to read your .doc file as raw text (or bytes), I suggest you open it with less or vi.  That may shed some light as to the format it's in.  Who knows, it may indeed be a Zip file.


----------



## wblock@ (Feb 17, 2011)

file(1) can identify many files:
`% file work1.doc`


----------



## kpedersen (Feb 17, 2011)

.doc is definitely not a .zip file.

It is a binary container for macros and viruses 

AFAIK you might be thinking of .docx


----------



## jotawski (Feb 17, 2011)

jotawski said:
			
		

> Would you please demonstrate, I can not find any xml file in .doc file.



apologized me, there is xml tag in that file too.
`[~] % grep -n -a xml work1.doc`

```
81:<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1.1-111">
82:   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
84:            xmlns:dc="http://purl.org/dc/elements/1.1/">
88:            xmlns:xap="http://ns.adobe.com/xap/1.0/">
95:            xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/"
96:            xmlns:stRef="http://ns.adobe.com/xap/1.0/sType/ResourceRef#">
105:            xmlns:tiff="http://ns.adobe.com/tiff/1.0/">
113:            xmlns:exif="http://ns.adobe.com/exif/1.0/">
120:            xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/">
[~] %
```

but that is not what I'm interested in,  I simply want to replace some words like TIME, CUSTOMERS, ISOTOPES with the real value enterred later.

Many thanks indeed for every helps and  hints.


----------



## Fred (Feb 17, 2011)

You mentionned an "open office template", and there is an "openoffice 3.1.0" tag on this thread, so I assumed that you were dealing with ODT files, which are definitely ZIP archives containing (amongst others) an XML file with the text of your document.

If you are instead talking about DOC files produced by Word, then, as you discovered and as others pointed out, this is false, and going to be harder. You may want to look into the Win32::Word::* packages for Perl, or whatever is your favourite language.


----------



## jotawski (Feb 18, 2011)

Thanks indeed for your hints.  I am now reading, or more specific is studying, http://search.cpan.org/~dami/MsOffice-Word-HTML-Writer-0.07/lib/MsOffice/Word/HTML/Writer.pm given by your link.

The story is that, girl prepares ms-word documents for her boss and she complains that she has to write every things almost always the same for every customers.  I offered myself to assist her by using my little knowledge of grep/sed to replace just a few variables like DATE, CUSTOMERS, PRICE and so on.  I got REAL.doc and make it work1.doc with openoffice.

But the real world is not so simple and that's why I am asking.

Many thanks indeed for all helps and hints and more suggestions are welcome.


----------



## jotawski (Feb 20, 2011)

jotawski said:
			
		

> Thanks indeed for your hints.  I am now reading, or more specific is studying, http://search.cpan.org/~dami/MsOffice-Word-HTML-Writer-0.07/lib/MsOffice/Word/HTML/Writer.pm given by your link.



that is slightly out of the way to solve my problems.  but looking to an other links provided by Fred and many thanks indeed.

any helps and hints are needed and welcome.


----------

