# ULE in 8.1-STABLE and CPU topology



## jkcarrol (Nov 3, 2010)

Hi All,

I wanted to ask if anyone here knows about how the CPU topology detection works in ULE, specifically I'm running 8.1-STABLE:


```
FreeBSD pflog.net 8.1-STABLE FreeBSD 8.1-STABLE #0 r208898M: Sat Oct 30 16:59:12 PDT 2010     
[email]root@pflog.net[/email]:/usr/obj/usr/src/sys/PFLOG  amd64
```
My KERNCONF is just GENERIC with a few additions:


```
device          pf
device          pflog
device          coretemp
device          uchcom
device          sound
device          snd_hda
option          NETATALK
option          ALTQ
option          ALTQ_CBQ
option          ALTQ_HFSC
option          ALTQ_NOPCC
option          ALTQ_PRIQ
option          ALTQ_RED
option          ALTQ_RIO
option          COMPAT_LINUX32
option          GEOM_MIRROR
option          LIBICONV
option          LIBMCHAIN
option          NETSMB
option          NULLFS
option          SMBFS
option          UDF
```

I don't have mptable in there, as you can see, as I wasn't sure if it was required for optimal performance.

For the sake of this discussion, my particular processor is an Intel Q9550, which has two 2-way 6 MB caches. However, the topology reported looks to me like it thinks there is a single 4-way cache:


```
% sysctl kern.sched.topology_spec
kern.sched.topology_spec: <groups>
 <group level="1" cache-level="0">
  <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
  <children>
   <group level="2" cache-level="2">
    <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
   </group>
  </children>
 </group>
</groups>
```

Correct me if I'm wrong, but that means it thinks the L2 cache is shared by all 4 cores, right?

Currently the kern.smp.topology sysctl is 0, which I guess means auto?

Can someone help me with any ULE tunings I should set for this specific CPU? I have to believe a multi-threaded application will run better when ULE knows there are two separate caches each shared by 2 cores vs. 1 single shared L2 cache.

So I'm looking for what sysctl.conf/loader.conf tunables I should set and/or if I need mptable or anything else in my kernel for it to take advantage of the CPU above what I already have here.

And tips would be greatly appreciated!


----------



## phoenix (Nov 3, 2010)

You need to update your system.  Intel CPU topology detection support was just recently committed to -STABLE.  You need to have at least revision *r214621*.

You're approx 6,000 commits behind.


----------



## jkcarrol (Nov 4, 2010)

phoenix said:
			
		

> You need to update your system.  Intel CPU topology detection support was just recently committed to -STABLE.  You need to have at least revision *r214621*.
> 
> You're approx 6,000 commits behind.



Hmm, I just csup'd again yesterday and built a kernel but it's still reporting the same version:


```
FreeBSD pflog.net 8.1-STABLE FreeBSD 8.1-STABLE #0 r208898M: Tue Nov  2 17:47:23 PDT 2010     
[email]root@pflog.net[/email]:/usr/obj/usr/src/sys/PFLOG  amd64
```

Is it somehow picking this up from something in SVN? I guess I'll try moving /usr/src out of the way and re-fetching it completely and rebuilding to see what revision shows up.

Or perhaps the cvsup mirror I'm using is out of date? Though I do remember seeing csup fetch some files that, according to the commit log, were part of the Intel topology MFC.


----------



## jkcarrol (Nov 4, 2010)

Ok, I snagged the 8.1 snapshot cd1 ISO from October, mounted it via mdconfig, installed the src from that, then csup'd /usr/src with the RELENG_8 tag from cvsup10.us.freebsd.org.

I then diff'd my old /usr/src with this new one. Not a single diff. However, I must have used svn to check out the src at some point, because the diff does show .svn directories in every sub-dir of /usr/src, which I think is where it's picking up that revision number.

I'm going to rebuild the kernel and reboot and see what uname then shows, and re-run my tests to check the performance, but I think I am in fact running the latest 8.1-STABLE, the uname is just wrong because it picked up the stale SVN info from the .svn metadata.


----------



## phoenix (Nov 4, 2010)

jkcarrol said:
			
		

> Hmm, I just csup'd again yesterday and built a kernel but it's still reporting the same version:



The svn2cvs exporter is down right now, won't be back online until this evening.  Until then, using csup/cvsup to update the source tree will complete with 0 updates.

This could be a good time to switch to svn for updating the source tree.


----------



## jkcarrol (Nov 4, 2010)

phoenix said:
			
		

> The svn2cvs exporter is down right now, won't be back online until this evening.  Until then, using csup/cvsup to update the source tree will complete with 0 updates.
> 
> This could be a good time to switch to svn for updating the source tree.



It did update quite a few things though when I ran the csup. So it updated things to newer versions than was in the 201010 8.1-STABLE snapshot.

Perhaps if it didn't get all the updates it would explain the fact that the kernel I built last night was causing NFS to fail. nfsd was unkillable and I couldn't mount any of the NFS mounts from a mac, except for one, which I thought was odd. I reverted to my kernel build on 11/2 ~5:45 pm PST and it was fine with no NFS issues.

I'll create a new /usr/src by using SVN and compare it to my existing /usr/src which was created as I mentioned above (extracted src 8.1-stable snapshot ISO, then csup'd from cvsup10.freebsd.org). I'll report back any diffs I find.

Thanks!


----------



## jkcarrol (Nov 5, 2010)

The only diffs were in the $FreeBSD headers. But I built the kernel after the SVN update and now uname reports: r208898M, and kern.smp.topology is still 0, with kern.sched.topology_spec showing the same:


```
kern.sched.topology_spec: <groups>
 <group level="1" cache-level="0">
  <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
  <children>
   <group level="2" cache-level="2">
    <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
   </group>
  </children>
 </group>
</groups>
```

I did try setting kern.smp.topology to 5, which from what I can tell is the proper setting for my processor (two 2-way caches), but performance didn't change.

I've unfortunately hit upon a nasty nfsd bug now, so I'm going to have to put this scheduler/topology stuff on hold while I debug that.


----------



## phoenix (Nov 5, 2010)

You may want to take this up on the -stable (and/or -current) mailing list, then.


----------



## jkcarrol (Nov 5, 2010)

phoenix said:
			
		

> You may want to take this up on the -stable (and/or -current) mailing list, then.



Yep, thanks, I did and Rick already responded with a patch as a workaround, and it's working for me so far.  Hopefully that patch (or an alternative fix) is put in place before the code freeze.

Anyway, back on topic to my original question, can someone else with an Intel Core 2 Quad (or any SMP system for that matter) running 8.1-STABLE or CURRENT reply here with the values of the two sysctl MIBs below? I used SVN yesterday to fetch /usr/src, and here's the uname after building a new kernel with those sources:


```
FreeBSD pflog.net 8.1-STABLE FreeBSD 8.1-STABLE #0 r214807M: Fri Nov  5 10:06:34 PDT 2010     
[email]root@pflog.net[/email]:/usr/obj/usr/src/sys/PFLOG  amd64
```

And it is still showing the "wrong" topology.

I'm interested to see the output of the following two sysctl MIBs along with what processor(s) you have in the system and their configuration so I can correlate this topology output with the actual architecture of the CPU. Please reply with the output from:

[cmd=]sysctl kern.smp.topology[/cmd]

and

[cmd=]sysctl kern.sched.topology_spec[/cmd]

In my case, unless I override it in /boot/loader.conf, I get the following:


```
floyd@pflog:~% sysctl kern.smp.topology
[b]kern.smp.topology: 0[/b]
```

and


```
floyd@pflog:~% sysctl kern.sched.topology_spec
[b]kern.sched.topology_spec: <groups>
 <group level="1" cache-level="0">
  <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
  <children>
   <group level="2" cache-level="2">
    <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
   </group>
  </children>
 </group>
</groups>[/b]
```

If I set kern.smp.topology=5 in /boot/loader.conf, then the topology looks correct (I don't have the output currently, but next time I reboot I can set it and show here, but basically instead of a single group of children, I have 2 groups of children which is what I'd expect for my CPU's architecture.

I'm basically wondering if the topology_spec is just printing the wrong thing since I'm not explicitly setting the kern.smp.topology MIB, or if internally the scheduler truly thinks I have a single 12MB 4-way cache instead of the two 6MB 2-way caches.

Thanks!


----------



## Galactic_Dominator (Nov 6, 2010)

```
FreeBSD vbox.galacticdominator.com 8.1-STABLE FreeBSD 8.1-STABLE #1: Thu Oct  7 06:02:07 CDT 2010     
adam@vbox.galacticdominator.com:/usr/obj/usr/src/sys/GENERIC  amd64
CPU: AMD Phenom(tm) II X4 965 Processor (3411.64-MHz K8-class CPU)
kern.smp.topology: 0
kern.sched.topology_spec: <groups>
 <group level="1" cache-level="0">
  <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
  <children>
   <group level="2" cache-level="2">
    <cpu count="4" mask="0xf">0, 1, 2, 3</cpu>
   </group>
  </children>
 </group>
</groups>
```


```
FreeBSD galacticdominator.com 8.1-STABLE FreeBSD 8.1-STABLE #1: Mon Oct  4 14:22:18 CDT 2010     
adam@galacticdominator.com:/usr/obj/usr/src/sys/GALACTICDOMINATOR  amd64
CPU: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (2940.64-MHz K8-class CPU)
kern.smp.topology: 0
kern.sched.topology_spec: <groups>
 <group level="1" cache-level="0">
  <cpu count="8" mask="0xff">0, 1, 2, 3, 4, 5, 6, 7</cpu>
  <children>
   <group level="2" cache-level="2">
    <cpu count="8" mask="0xff">0, 1, 2, 3, 4, 5, 6, 7</cpu>
    <children>
     <group level="3" cache-level="1">
      <cpu count="2" mask="0x3">0, 1</cpu>
      <flags><flag name="THREAD">THREAD group</flag><flag name="SMT">SMT group</flag></flags>
     </group>
     <group level="3" cache-level="1">
      <cpu count="2" mask="0xc">2, 3</cpu>
      <flags><flag name="THREAD">THREAD group</flag><flag name="SMT">SMT group</flag></flags>
     </group>
     <group level="3" cache-level="1">
      <cpu count="2" mask="0x30">4, 5</cpu>
      <flags><flag name="THREAD">THREAD group</flag><flag name="SMT">SMT group</flag></flags>
     </group>
     <group level="3" cache-level="1">
      <cpu count="2" mask="0xc0">6, 7</cpu>
      <flags><flag name="THREAD">THREAD group</flag><flag name="SMT">SMT group</flag></flags>
     </group>
    </children>
   </group>
  </children>
 </group>
</groups>
```


----------

