# ZiL in ZFS. How does it work?



## stassik (Sep 7, 2011)

Hello!

Please tell me how does ZiL work in ZFS? L2ARC works as a READ cache layer in-between main memory and Disk Storage Pool. It holds non-dirty ZFS data, and is currently intended to improve the performance of random READ workloads or streaming READ workloads (l2arc_noprefetch option).
ARC<->L2ARC<->Disk Storage Pool.

ZiL works as a WRITE cache layer in-between main memory and Disk Storage Pool. But how does it work? ZiL currently intended to improve the performance of random OR streaming WRITE workloads? When ZiL send the ZFS data to Disk Storage Pool, when ZiL is full?

If l2arc_noprefetch is enabled, L2ARC reading data from Disk Storage Pool, only when not found same data in L2ARC. How often ZiL writing data to Disk Storage Pool?


----------



## Sylhouette (Sep 8, 2011)

I did some googling 

here we go!

ZIL (ZFS Intent Log) drives can be added to a ZFS pool to speed up the write capabilities of any level of ZFS RAID. It writes the metadata for a file to a very fast SSD drive to increase the write throughput of the system. When the physical spindles have a moment, that data is then flushed to the spinning media and the process starts over. We have observed significant performance increases by adding ZIL drives to our ZFS configuration. One thing to keep in mind is that the ZIL should be mirrored to protect the speed of the ZFS system. If the ZIL is not mirrored, and the drive that is being used as the ZIL drive fails, the system will revert to writing the data directly to the disk, severely hampering performance.



> When ZiL send the ZFS data to Disk Storage Pool, when ZiL is full?


Answer No 
If i got the story right, it will dump the zil data to the disks when they are ready to take the burst.


http://blogs.oracle.com/perrin/entry/the_lumberjack

http://blogs.oracle.com/realneel/entry/the_zfs_intent_log

regards
Johan


----------



## stassik (Sep 8, 2011)

Sylhouette said:
			
		

> If the ZIL is not mirrored, and the drive that is being used as the ZIL drive fails, the system will revert to writing the data directly to the disk, severely hampering performance.



I have read that ZFSv18 and older cannot import a pool with a failed log vdev. Thus, all log vdevs must be created as mirrors...



			
				Sylhouette said:
			
		

> If i got the story right, it will dump the zil data to the disks when they are ready to take the burst.


This means that when during a long time of not having any commands to the write, the ZiL is completely empty?

P.S. Sorry for my russian-english language ^)


----------



## Goose997 (Sep 8, 2011)

stassik said:
			
		

> I have read that ZFSv18 and older cannot import a pool with a failed log vdev. Thus, all log vdevs must be created as mirrors...



If the ZIL is not mirrored, the system will revert to writing directly to the pool disks.



			
				stassik said:
			
		

> This means that when during a long time of not having any commands to the write, the ZiL is completely empty?



That is correct.  Here is a good link to read on ZIL (sizing, usage, etc.): http://www.nexenta.com/corp/content/view/274/119/

Whether you benefit from the ZIL will very much depend on the type of workload your server has.

regards
Malan


----------



## stassik (Sep 8, 2011)

Goose997 said:
			
		

> Whether you benefit from the ZIL will very much depend on the type of workload your server has.



I want to use the L2ARC and ZiL on the server to boot diskless clients on
iSCSI protocol (100 + clients winXP)...


----------



## Goose997 (Sep 8, 2011)

stassik said:
			
		

> I want to use the L2ARC and ZiL on the server to boot diskless clients on
> iSCSI protocol (100 + clients winXP)...



I am also trying to find when a ZIL makes sense.  From what I have read the ZIL does make a difference on NFS shares, but not on Samba file shares.  I have not found an explanation for this so I could be wrong.

The L2ARC should make a difference if you are booting a lot of clients off the server.  Look at the *zfs clone* command.  ZFS will only store the differences from the original for each clone so once you have winXP in the cache each client should read it very fast.  I have not tried this, I am speculating that this will work from what I have read.  Look at the drawings here: http://forums.virtualbox.org/viewtopic.php?f=6&t=25107

regards
Malan


----------



## rusty (Sep 9, 2011)

Perhaps zilstat runs on FreeBSD? - http://www.richardelling.com/Home/scripts-and-programs-1/zilstat


----------



## Sebulon (Sep 9, 2011)

Goose997 said:
			
		

> I am also trying to find when a ZIL makes sense.



Tell me about it
NFS write performance with mirrored ZIL

As far as I can say, a SSD as ZIL helps with IOPS, not throughput.

Unless you have two or more of the fastest, most expensive SSDÂ´s on the market, you wonÂ´t even hit 100MB/s. If you have enough disks in your pool and add a SSD, you can actually get worse throughput.

And about SAMBA and iSCSI. Most installations of SAMBA uses AIO (AsynchronousIO) that cancels the ZIL, but in turn speeds up throughput. I have also tried not using AIO on SAMBA with a Intel X25-E as ZIL and even then it did not use the ZIL; the disk was just idle. As far as I know, iSCSI doesnÂ´t have any AIO-module but it sends data async anyway. Even with the X25-E, when sending data and looking at gstat, the X25-E was just idle most of time. Neither of these tests resulted in greater throughput. All of the transfers took exactly the same amount of time with an X25-E as ZIL, as they did without- on SAMBA and iSCSI.

Also worth mentioning is that even NFS can be told to send async:

```
# mount -o async
```
And becomes the fastests protocol IÂ´ve ever tested on a short distance LAN. Maxes 1GigE without any tweaking what so ever.

In short, if you want to have serious *synchronous* speed with a SSD, prepare to open your wallets

/Sebulon


----------



## Goose997 (Sep 9, 2011)

hi Sebulon



			
				Sebulon said:
			
		

> Tell me about it
> 
> Unless you have two or more of the fastest, most expensive SSDÂ´s on the market, you wonÂ´t even hit 100MB/s. If you have enough disks in your pool and add a SSD, you can actually get worse throughput.
> 
> /Sebulon



It seems the only way to to this is to benchmark your own configuration and see if it makes a difference.  I have dedup enabled and get around 11 Mb/s consistent write speed with 16 GB RAM.  With 8 GB RAM I got peaks and valleys.  This is over a 1 Gb LAN with Samba shares.

I have a Corsair 60GB SSD which seems to have impressive read and write specs.  I have not installed it yet but might try different combinations of ZIL and L2ARC.

Unfortunately I must first replace a failed 2 TB HDD before I can play with this x(

regards
Malan


----------



## Sebulon (Sep 9, 2011)

Well since SAMBA doesnÂ´t use ZIL anyway, I think you have different underlying issues to deal with also. Try sharing from NFS instead and be sure to mount the share with:

```
mount [B]-o async[/B] server:/share /foo/bar
```
so it acts just like SAMBA does and measure the difference in throughput. If NFS is faster than 11MB/s, you need to tweek SAMBA to speed up the performance.

/Sebulon


----------



## Goose997 (Sep 9, 2011)

Sebulon said:
			
		

> Well since SAMBA doesnÂ´t use ZIL anyway, I think you have different underlying issues to deal with also.
> 
> /Sebulon



hi

Thanks for the tip.  I had a look and saw I have a LAN cable with a connector that does not clip in properly - the LAN was only running at 100 Mb on the machine I copy from.  With that fixed, I get around 80 MB/s over Samba to non-compressed/non-dedup shares.  With dedup and compression I get around 12 Mb/s.  I am in parallel resilvering a replaced disk.

Maybe I don't need the ZIL now :e

regards
Malan


----------

