# Viewing docx files



## mdg (Feb 8, 2011)

Any suggestions on a docx viewer?  Just want to view them, not edit.
OpenOffice is too much and Abiword seems to have trouble with docx.


----------



## treepython (Feb 8, 2011)

Hello,
libreoffice can be a solution for your problem. libreoffice is only fork of openoffice but you needn't use java.


----------



## Beastie (Feb 8, 2011)

If all you want to do is read them, then uncompress them and read the contents using any text editor.


----------



## DutchDaemon (Feb 9, 2011)

See if docs.google.com knows what to do with them?


----------



## dennylin93 (Feb 9, 2011)

LibreOffice is quite big as well, but it somehow manages to compile a lot faster than OpenOffice.org, so it might be worth a try.


----------



## rhish (Aug 28, 2012)

This turned out to be great advice! Thanks!


----------



## Deleted member 9563 (Jun 16, 2014)

*Re:*



			
				Beastie said:
			
		

> If all you want to do is read them, then uncompress them and read the contents using any text editor.


What command would I use to do that?


----------



## bsdkeith (Jun 16, 2014)

This site gives an insight to docx.
http://pcsupport.about.com/od/fileexten ... cxfile.htm


----------



## Beastie (Jun 16, 2014)

*Re: Re:*



			
				OJ said:
			
		

> Beastie said:
> 
> 
> 
> ...


`tar xf file.docx` should do.


----------



## Deleted member 9563 (Jun 17, 2014)

Thanks @Beastie! That's really useful. Unfortunately the content shows up as document.xml. So what does one do with that? A text editor show garbage. Konqueror reads it, but runs some words together. Not bad though, and will do in a pinch. Firefox just displays a mess of markup. I wonder if there's a simple (command line is best) program to convert .xml to plain text.


----------



## cpm@ (Jun 17, 2014)

OJ said:
			
		

> I wonder if there's a simple (command line is best) program to convert .xml to plain text.


You can use xmlto(1) for that purpose, as following: 
`% xmlto txt document.xml`


----------



## Deleted member 9563 (Jun 17, 2014)

cpm said:
			
		

> You can use xmlto(1) for that purpose, as following:
> `% xmlto txt document.xml`



I tried that with several documents from two different sources and the result is this:

```
Document /home/ole/tmp/word/document.xml does not validate
```


----------



## cpm@ (Jun 17, 2014)

OJ said:
			
		

> cpm said:
> 
> 
> 
> ...



You need to pass or  use --skip-validation option or fix the document syntax


----------



## Deleted member 9563 (Jun 18, 2014)

cpm said:
			
		

> You need to pass or  use --skip-validation option or fix the document syntax


Oops, sorry I forgot to mention that I already tried that. Perhaps Microsoft has their own proprietary format for XML since that just gives a .txt file with a great pile of markup. Like this:


```
<w:document><w:body><w:p><w:pPr><w:jc></w:jc><w:rPr><w:b></w:b><w:i></w:i>
<w:sz></w:sz><w:szCs></w:szCs><w:u></w:u></w:rPr></w:pPr><w:r><w:rPr><w:b></
w:b><w:i></w:i><w:sz></w:sz><w:szCs></w:szCs><w:u></w:u></w:rPr><w:t>Attn:
Residents of </w:t></w:r><w:proofErr></w:proofErr><w:r><w:rPr><w:b></w:b><w:i>
</w:i><w:sz></w:sz><w:szCs></w:szCs><w:u></w:u></w:rPr><w:t>Coalmont</w:t></
```


----------



## cpm@ (Jun 18, 2014)

You can strip out all XML tags of word/document.xml, e.g. `% unzip document.docx word/document.xml | sed 's#</w:p>#\n\n#g;s#<[^>]*>##g'`


----------

