# file command



## Sylhouette (Apr 19, 2010)

Hello all, i am running mailscanner on my FreeBSD systems.

This is on 7.2 and FreeBSD 8

the problem i have is with the file command.
On FreeBSD 8 it detects most of the files correct.
But on FreeBSD 7.x some of my .txt and htm files from custumors are detected by file as a MPEG file.
And the mailscanner does not allo MPEG files.

Can someone please tell me how i can make FreeBSD 7 detect htm files as htm and not MPEG.

regards,
Sylhouette


----------



## Beastie (Apr 19, 2010)

They're detected as MPEG most probably because they _are_ MPEG files... with *.txt* and *.html* extensions.

Try `% hexdump -C -n 256 filename`.

If you get something like (this page)

```
00000000  3c 21 44 4f 43 54 59 50  45 20 68 74 6d 6c 20 50  |<!DOCTYPE html P|
00000010  55 42 4c 49 43 20 22 2d  2f 2f 57 33 43 2f 2f 44  |UBLIC "-//W3C//D|
00000020  54 44 20 58 48 54 4d 4c  20 31 2e 30 20 54 72 61  |TD XHTML 1.0 Tra|
00000030  6e 73 69 74 69 6f 6e 61  6c 2f 2f 45 4e 22 20 22  |nsitional//EN" "|
00000040  68 74 74 70 3a 2f 2f 77  77 77 2e 77 33 2e 6f 72  |http://www.w3.or|
00000050  67 2f 54 52 2f 78 68 74  6d 6c 31 2f 44 54 44 2f  |g/TR/xhtml1/DTD/|
00000060  78 68 74 6d 6c 31 2d 74  72 61 6e 73 69 74 69 6f  |xhtml1-transitio|
00000070  6e 61 6c 2e 64 74 64 22  3e 0d 0a 3c 68 74 6d 6c  |nal.dtd">..<html|
00000080  20 78 6d 6c 6e 73 3d 22  68 74 74 70 3a 2f 2f 77  | xmlns="http://w|
00000090  77 77 2e 77 33 2e 6f 72  67 2f 31 39 39 39 2f 78  |ww.w3.org/1999/x|
000000a0  68 74 6d 6c 22 20 64 69  72 3d 22 6c 74 72 22 20  |html" dir="ltr" |
000000b0  6c 61 6e 67 3d 22 65 6e  22 3e 0d 0a 3c 68 65 61  |lang="en">..<hea|
000000c0  64 3e 0d 0a 09 3c 6d 65  74 61 20 68 74 74 70 2d  |d>...<meta http-|
000000d0  65 71 75 69 76 3d 22 43  6f 6e 74 65 6e 74 2d 54  |equiv="Content-T|
000000e0  79 70 65 22 20 63 6f 6e  74 65 6e 74 3d 22 74 65  |ype" content="te|
000000f0  78 74 2f 68 74 6d 6c 3b  20 63 68 61 72 73 65 74  |xt/html; charset|
```
it's really an HTML document

If you get something like

```
00000000  00 00 01 ba 21 00 01 00  09 80 19 53 00 00 01 bb  |...Âº!......S...Â»|
00000010  00 0c 80 19 53 06 e1 ff  e0 e0 4a c0 c0 20 00 00  |....S.Ã¡Ã¿Ã Ã JÃ€Ã€ ..|
00000020  01 be 07 dc 0f ff ff ff  ff ff ff ff ff ff ff ff  |.Â¾.Ãœ.Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿|
00000030  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿|
```
it must be MPEG video.

If your mail scanner accepts archived/compressed files, then you can try to fool it this way...


----------



## MG (Apr 19, 2010)

Can you show an example of a htm/txt file recognized as MPEG?


----------



## DutchDaemon (Apr 19, 2010)

I've never seen this on any FreeBSD version I ran MailScanner on (4.x onwards). Are you sure it's FreeBSD's file command that's causing the problems, and not some settings in either filename.rules.conf and/or filetype.rules.conf?

And yes, Beastie is right, sometimes people try to escape these rules by changing file extensions and other tricks.


----------



## Sylhouette (Apr 20, 2010)

Well on my 8.x systems the file is being detected as an htm document.
On all the 7.0 and 7.1 they are detected as MPEG

here is the file command on both systems (same file)

On the FreeBSD 8.0 system

```
ms02 ~ # file test.htm
test.htm: HTML document text
ms02 ~ #file -i test.htm
test.htm: text/html; charset=utf-16le
```

On the FreeBSD 8 system the command

```
ms03 ~ # hexdump -C -n 256 test.htm
00000000  ff fe 3c 00 48 00 54 00  4d 00 4c 00 3e 00 3c 00  |..<.H.T.M.L.>.<.|
00000010  48 00 45 00 41 00 44 00  3e 00 0a 00 3c 00 53 00  |H.E.A.D.>...<.S.|
00000020  54 00 59 00 4c 00 45 00  3e 00 0a 00 20 00 2e 00  |T.Y.L.E.>... ...|
00000030  50 00 41 00 47 00 45 00  31 00 20 00 7b 00 20 00  |P.A.G.E.1. .{. .|
00000040  68 00 65 00 69 00 67 00  68 00 74 00 3a 00 20 00  |h.e.i.g.h.t.:. .|
00000050  30 00 32 00 38 00 38 00  6d 00 6d 00 3b 00 77 00  |0.2.8.8.m.m.;.w.|
00000060  69 00 64 00 74 00 68 00  3a 00 20 00 30 00 32 00  |i.d.t.h.:. .0.2.|
00000070  38 00 38 00 6d 00 6d 00  3b 00 20 00 7d 00 0a 00  |8.8.m.m.;. .}...|
00000080  20 00 2e 00 50 00 41 00  47 00 45 00 4e 00 20 00  | ...P.A.G.E.N. .|
00000090  7b 00 20 00 68 00 65 00  69 00 67 00 68 00 74 00  |{. .h.e.i.g.h.t.|
000000a0  3a 00 20 00 30 00 32 00  38 00 38 00 6d 00 6d 00  |:. .0.2.8.8.m.m.|
000000b0  3b 00 77 00 69 00 64 00  74 00 68 00 3a 00 20 00  |;.w.i.d.t.h.:. .|
000000c0  30 00 32 00 38 00 38 00  6d 00 6d 00 3b 00 70 00  |0.2.8.8.m.m.;.p.|
000000d0  61 00 67 00 65 00 2d 00  62 00 72 00 65 00 61 00  |a.g.e.-.b.r.e.a.|
000000e0  6b 00 2d 00 62 00 65 00  66 00 6f 00 72 00 65 00  |k.-.b.e.f.o.r.e.|
000000f0  3a 00 20 00 61 00 6c 00  77 00 61 00 79 00 73 00  |:. .a.l.w.a.y.s.|
00000100
```


On the FreeBSD 7.1 system


```
ms01 ~ # file test.htm
test.htm: MPEG ADTS, layer I, v1,  96 kBits, Stereo
ms01 ~ # file -i test.htm
test.htm: audio/mpeg
```


```
ms01 ~ # hexdump -C -n 256 test.htm
00000000  ff fe 3c 00 48 00 54 00  4d 00 4c 00 3e 00 3c 00  |..<.H.T.M.L.>.<.|
00000010  48 00 45 00 41 00 44 00  3e 00 0a 00 3c 00 53 00  |H.E.A.D.>...<.S.|
00000020  54 00 59 00 4c 00 45 00  3e 00 0a 00 20 00 2e 00  |T.Y.L.E.>... ...|
00000030  50 00 41 00 47 00 45 00  31 00 20 00 7b 00 20 00  |P.A.G.E.1. .{. .|
00000040  68 00 65 00 69 00 67 00  68 00 74 00 3a 00 20 00  |h.e.i.g.h.t.:. .|
00000050  30 00 32 00 38 00 38 00  6d 00 6d 00 3b 00 77 00  |0.2.8.8.m.m.;.w.|
00000060  69 00 64 00 74 00 68 00  3a 00 20 00 30 00 32 00  |i.d.t.h.:. .0.2.|
00000070  38 00 38 00 6d 00 6d 00  3b 00 20 00 7d 00 0a 00  |8.8.m.m.;. .}...|
00000080  20 00 2e 00 50 00 41 00  47 00 45 00 4e 00 20 00  | ...P.A.G.E.N. .|
00000090  7b 00 20 00 68 00 65 00  69 00 67 00 68 00 74 00  |{. .h.e.i.g.h.t.|
000000a0  3a 00 20 00 30 00 32 00  38 00 38 00 6d 00 6d 00  |:. .0.2.8.8.m.m.|
000000b0  3b 00 77 00 69 00 64 00  74 00 68 00 3a 00 20 00  |;.w.i.d.t.h.:. .|
000000c0  30 00 32 00 38 00 38 00  6d 00 6d 00 3b 00 70 00  |0.2.8.8.m.m.;.p.|
000000d0  61 00 67 00 65 00 2d 00  62 00 72 00 65 00 61 00  |a.g.e.-.b.r.e.a.|
000000e0  6b 00 2d 00 62 00 65 00  66 00 6f 00 72 00 65 00  |k.-.b.e.f.o.r.e.|
000000f0  3a 00 20 00 61 00 6c 00  77 00 61 00 79 00 73 00  |:. .a.l.w.a.y.s.|
00000100
```


It is the same file copied using scp.

It is an htm document, i can open it within my browser.

Also on the 8.0 system i read this in the /usr/share/misc/magic file


```
# MPA, M1A
# updated by Joerg Jenderek
# GRR the original test are too common for many DOS files, so test 32 <= kbits <= 448
# GRR this test is still too general as it catches a BOM of UTF-16 files (0xFFFE)
# FIXME: Almost all little endian UTF-16 text with BOM are clobbered by these entries
#0      beshort&0xFFFE          0xFFFE
#>2     ubyte&0xF0      >0x0F
#>>2    ubyte&0xF0      <0xE1           MPEG ADTS, layer I, v1
## rate
#>>>2      byte&0xF0       0x10           \b,  32 kbps
#>>>2      byte&0xF0       0x20           \b,  64 kbps
#>>>2      byte&0xF0       0x30           \b,  96 kbps
#>>>2      byte&0xF0       0x40           \b, 128 kbps
#>>>2      byte&0xF0       0x50           \b, 160 kbps
#>>>2      byte&0xF0       0x60           \b, 192 kbps
#>>>2      byte&0xF0       0x70           \b, 224 kbps
#>>>2      byte&0xF0       0x80           \b, 256 kbps
#>>>2      byte&0xF0       0x90           \b, 288 kbps
#>>>2      byte&0xF0       0xA0           \b, 320 kbps
#>>>2      byte&0xF0       0xB0           \b, 352 kbps
#>>>2      byte&0xF0       0xC0           \b, 384 kbps
#>>>2      byte&0xF0       0xD0           \b, 416 kbps
#>>>2      byte&0xF0       0xE0           \b, 448 kbps
## timing
#>>>2      byte&0x0C       0x00           \b, 44.1 kHz
#>>>2      byte&0x0C       0x04           \b, 48 kHz
#>>>2      byte&0x0C       0x08           \b, 32 kHz
## channels/options
#>>>3      byte&0xC0       0x00           \b, Stereo
#>>>3      byte&0xC0       0x40           \b, JntStereo
#>>>3      byte&0xC0       0x80           \b, 2x Monaural
#>>>3      byte&0xC0       0xC0           \b, Monaural
##>1     byte            ^0x01          \b, Data Verify
##>2     byte            &0x02          \b, Packet Pad
##>2     byte            &0x01          \b, Custom Flag
##>3     byte            &0x08          \b, Copyrighted
##>3     byte            &0x04          \b, Original Source
##>3     byte&0x03       1              \b, NR: 50/15 ms
##>3     byte&0x03       3              \b, NR: CCIT J.17
```

# FIXME: Almost all little endian UTF-16 text with BOM are clobbered by these entries

is what i am running in i guess. see the file -i command on 8.

On FreeBSD 7 the entry above is not commented out.
But if i do comment it out, it still see's the file as a MPEG.
I do not know how to make the file command reconize the file as it is.
This goes beond my knowledge 

Here is the file.

View attachment test.zip

regards.
Sylhouette


----------



## aragon (Apr 20, 2010)

Yup, you're right.  The UTF-16 BOM is clashing with that MPEG file type pattern.  After you comment out that entry are you recompiling the magic file?


```
cd /usr/share/misc && file -C
```


----------



## DutchDaemon (Apr 20, 2010)

Interesting though 


```
$ file test.htm 
test.htm: HTML document text

$ less test.htm 
"test.htm" may be a binary file.  See it anyway? 
<FF><FE><^@H^@T^@M^@L^@>^@<^@H^@E^@A^@D^@>^@
^@<^@S^@T^@Y^@L^@E^@>^@
```


----------



## Sylhouette (Apr 26, 2010)

Thanks all for the help, i did not do the file -C command, and i needed to do that.

That did the trick.

@ DutchDaemon
Yes very interesting, it opens in windows normaly, but on 7 it looks like a binary file.
These files are generated by Exact Software.
It has something to do with the encoding i guess
Also on FreeBSD i can not read the file with less and vi, but at least the system detects the file as is.

regards,
Sylhouette


----------

