# ports-mgmt/synth: Upgrade version 1.43 => 1.50



## marino (Aug 31, 2016)

```
This release improves robustness and activates the watchdog.
It leverages the procctl functionality to ensure all processes spawned
from a builder are reaped, which in turn ensures that tmpfs mounts can
be dismounted.  Previously stuck processes could prevent those dismounts,
trapping them as new mounts get placed on top.

This also finally enables the watchdog that will kill runaway builds.
The watchdog has a specific time limit per build phase where it will
kill the build if the log doesn't grow over the previous X minutes.

No activity timeout limits per phase are:

  check_sanity    :   1 minute
  pkg_depends     :   3 minutes
  fetch           : 480 minutes
  checksum        : 480 minutes (fetches if required)
  extract_depends :   3 minutes
  extract         :  30 minutes
  patch_depends   :   3 minutes
  patch           :   3 minutes
  build_depends   :   5 minutes
  build           :  20 minutes
  run_depends     :  10 minutes
  stage           :  20 minutes
  check_plist     :   3 minutes
  pkg_package     : 120 minutes
  install_mtree   :   3 minutes
  install         :  10 minutes
  deinstall       :  10 minutes

A minor change regarding the swap display: If there is no swap installed,
it will now display "n/a" instead of "100%"
```


----------



## mpeterma (Sep 25, 2016)

Hi, is there a way to configure / tune this time limits (apart from changing the code)?
I just had observed that my devel/llvm37 build phase did exceed the limits on a busy machine.
Thanks & kind regards,
Matthias


----------



## marino (Sep 25, 2016)

No.  Are you using v1.51 ?
Chances are the machine is overloaded if there's no output in 20 - 60 minutes.
What's the ncpu, how much RAM do you have, and what's your #builders and #jobs per builder?

I suspect your ram is low for the #builders + #jobs per builder and your machine is swapping hard.

FYI, I may bump the time limits of build and stage phases by 20% for usual suspects.


----------



## mpeterma (Sep 25, 2016)

Hi John, thanks for the quick response. The setup is as follows:

* AMD FX(tm)-6300 Six-Core Processor (3515.62-MHz K8-class CPU)
* 16 GB RAM
* Number_of_builders= 4
* Max_jobs_per_builder= 3
* FreeBSD 11-RELEASE
* ZFS root
* Swap 16 GB, but not used
* Synth 1.51

Do you recommend reducing the number of builders?
Furthermore, does the 20 min limit mean that the watchdog kills the process if the whole build phase takes 20 min, or if the build log doesn't get a new line in 20 min?

Best regards,
Matthias


----------



## marino (Sep 25, 2016)

wow, it looks like if anything your setup is conservative.   You shouldn't be having any issues.
"20 minutes" in this case means that the log wasn't incremented over a 20 minute period.  If it gets a single new line, the timer resets.
That is just the base time.  If your system is loaded (e.g. 5-minute average is > 12 with a ncpu=6 system) then that time limit grows (to 40 minutes in that example).

honestly I think your configuration is fine.  It's hard to imagine your machine being so loaded that it tripped the watchdog.

by the way, check `synth version` to make sure you're on v1.51 and not v1.50.


----------



## mpeterma (Sep 25, 2016)

Hi John,

yes - I am using Synth 1.51. I did some try to further isolate the issue. I set the number of builders to 1 and started another build run. Even with one builder devel/llvm37 build fails with the timeout. Attached you find the log file - unfortunately there are not too much timestamps in it, but maybe it gives some hint?

Best regards,
Matthias


----------



## marino (Sep 25, 2016)

it doesn't make any sense.  Having a 16Gb 6-core machine using -j3 shouldn't have trouble compiling a single c++ file like that.
If you want to turn the watchdog off, you can rebuild synth by changing src/portscan-buildcycle.adb line 704, change "hangmonitor" variable from True to False.

maybe it really does take that long to compile c++ files on K8's regardless of available memory...


----------



## mpeterma (Sep 26, 2016)

Hello John, thanks for these helpful hint. Turning off the watchdog will be my next step. Before that, I'll do a supervised build without Synth and do a timestamped logging of the output. Would be interesting as well what performs so bad on my System. 

Best regards,
Matthias


----------



## mpeterma (Sep 26, 2016)

Hello John,

to my surprise, when I perform the build of devel/llvm37 via


```
make -DBATCH | gawk '{ print strftime("[%Y-%m-%d %H:%M:%S]"), $0 }' > /home/admin/devel___llvm37.build.log 2>&1
```

the whole build takes less then 30 minutes, with no single line exceeding the 20 minutes limit. Do you have an idea what I could try next?

Best regards,
Matthias


----------



## marino (Sep 26, 2016)

I've no idea beyond turning off the hangmonitor and just trying to build it in Synth again.  Maybe observe during the build and see if you can see where it's getting hung up.  Maybe it really is hanging and the watchdog is doing its job.


----------



## mpeterma (Sep 27, 2016)

Hi John, after turning off the watchdog as described, the build of llvm37 was completed after about an hour. Unfortunately I did change the Number_of_builders= back to 4 before, so the measure is not too reliable.
Anyway - once my batch build is done, I will trigger another build with Number_of_builders= 1. My expectation is that it will be completed after ~30 minutes which was the time it took when building without synth just from the ports. 

Best regards,
Matthias


----------



## marino (Sep 27, 2016)

you expect -j1 to take 30 minutes while -j4 takes twice as long?  Usually the opposite occurs.

It still doesn't explain the 20+ minute stall in the middle of the build...

What would really help is leaving the watchdog on and changing line 1146 on /src/portscan-buildcycle.adb from 20 to 25.
(and if that fails from 20 to 30).

I'm trying to figure out what to loosen the limits to avoid false positives but still be reasonable.


----------



## marino (Sep 27, 2016)

to clarify, the number of builders doesn't matter if it's just one package, it's the number of jobs per builder that affects the build time (again, assuming it's the only thing building).


----------



## mpeterma (Sep 27, 2016)

Hello John,

thanks for the clarification. Actually we are on the same page. 
I meant that with Number_of_builders=1 I did ensure that only one port is built at once (as I am building with a port.list). 
Before my last test, I did set it back to 4, and that's why I think the 1 hour is ok, as there were parallel builds of other ports.
I will follow your recommendations to drill down to the root cause.

Best regards,
Matthias


----------



## marino (Sep 27, 2016)

before this post, I had opened an issue on this:
https://github.com/jrmarino/synth/issues/57

You can help validate the 25% bump from 20 minutes to 25 minutes.

I'd rather fix the limits than have people think they need to turn off the watchdog.


----------



## mpeterma (Sep 27, 2016)

Yes, will consider that when I have my results.
Did you thought about having this limits dynamically calculated, ie. by taking account of some environmental conditions (CPU, IO bandwith)... just an idea - not sure if this can be realistically achieved.
In any case I'd find it useful to have the limits configuration exposed as environment settings / config in the synth.ini so that they can be tweaked without rebuilding synth.

Best regards,
Matthias


----------



## marino (Sep 27, 2016)

it is dynamically calculated.
20 minutes is the BASE limit.
If the machine is loaded, that could have a multiplier from 1.1 => 5 .  (22 to 100 minutes)

The issue is that this multiplier doesn't get applied until the machine is already loaded.




> In any case I'd find it useful to have the limits configuration exposed as environment settings / config in the synth.ini so that they can be tweaked without rebuilding synth.



THis is exactly what I'm trying to avoid.  Too many options.  It confuses users.


----------



## marino (Sep 27, 2016)

there's cues that this watchdog issue only happens when building in text mode (aka not ncurses mode).  It may be a bug.  Stand by ...


----------



## marino (Sep 28, 2016)

mpetersma@ version 1.52 should solve your use case


```
ports-mgmt/synth: Upgrade version 1.51 => 1.52

Fix regression in text-mode caused by activation of watchdog.
The watchdog is checking the lengths of the build logs to figure out if
a builder has stalled.  It turns out that the logs were only being
inspected in ncurses display mode, so any port that took longer than
20 minutes to build would be aborted by the watchdog.

While here, bump the *BASE* time limit for the build phase from 20 to
25 minutes based on extreme causes (normally involving gcc or tex ports)
and also bump the check-plist phase limit from 3 minutes to 10 minutes.
Some ports have tens of thousands of files in them which takes a long
time to check under test mode, especially if the server is loaded.
```


----------



## marino (Oct 5, 2016)

Some people might welcome this point release:


```
ports-mgmt/synth: Upgrade version 1.52 => 1.53

Major bug fix: ncurses display resize hang fixed

  Until now, resizing the window why synth is running in ncurses mode
  caused synth to hang (it would finish the builds it was working on
  but the display wouldn't update and no new jobs would start).  This
  was due to an unhandled exception thown by ncurses binding as a result
  of the resize event, and now these are handled.

Minor fix: Ports with @info in pkg-plist now pass in test mode

  The mtree exclusion file was improved to allow these leftover info
  directories to be ignored (as is done in poudriere.  Before only
  info/dir was ignored, but the presence of "dir" prevented "info" from
  being removed by pkg(8) upon deinstallation.

enhancement: Augment text mode (requested)

  Now when a builder starts on a new package, the port origin will be
  shown in the running log (before only the completion was logged.)
```


----------



## xtaz (Oct 6, 2016)

Thanks so much for getting to the bottom of this. I quite often start a build on my laptop and then check up on it on my phone or vice versa. The two different size of screens quite often caused this hang. Nice to see it's fixed now.


----------



## marino (Oct 7, 2016)

There were still some quirks with resizing including more possible hangs, but hopefully that's really been addressed now:

```
ports-mgmt/synth: Upgrade version 1.53 => 1.54

Handles remaining resizing exceptions and improves display handling.

Yesterday's work handled most of the common display exceptions, but others
were still possible.  Now all possible exceptions are handled.

Several improvements were made to the display:
  1) lines no longer wrap if the size width is resized too narrow; they
     get truncated as always intended
  2) Elements such as the elapse timer don't get displayed in the wrong
     place when the screen is too narrow (they just don't show)
  3) The dashes now get restored if the screen is sized small and then
     big again (or started small and then expanded).  In many cases those
     lines just never came back before.
  4) The "full" refresh frequency was increased a period of 30 seconds to
     a period of 4 seconds.  This has a side benefit to text-mode watchdog
     as well since that's the same timer for the log inspection.
  5) The history window height ranges from 10 to 50 rows.  If the xterm
     window starts small, the history will be 10 lines.  If it starts
     big, the number of lines will be dictated by the original size of
     the xterm window.  Making the screen small and then bigger again will
     reveal the full number of log lines.
```


----------



## xtaz (Oct 8, 2016)

1.53 worked fine for me, but 1.54 causes the ncurses screen to be completely garbled and unreadable. It's not a tmux thing, it does it just in a plain terminal as well. Using PuTTY on windows as the terminal (112x34)


----------



## marino (Oct 8, 2016)

can you install bitvise (https://www.bitvise.com/ssh-client) and see how that works?
or try using "xterm" protocol if you aren't already?


----------



## marino (Oct 8, 2016)

hmm, i just saw this on FreeBSD console in virtualbox vm.  what's going on there?
It seems to be specific to freebsd.   (I see it in bitvise too)


----------



## marino (Oct 8, 2016)

xtaz@ this is really weird.  I think the problem lies in the ports version of ncurses.  I forcibly compiled adacurses and synth to use the base ncurses and it worked fine then on FreeBSD.
The ports version of ncurses works fine on DragonFly.  I'm not sure what's going on here.


----------



## marino (Oct 9, 2016)

xtaz@ this testcase proves that ports ncurses is broken on FreeBSD:
https://leaf.dragonflybsd.org/~marino/testcurses.shar

you can try it yourself.  with `make`, it will build "hello" program with base ncurses, and it works fine.
with `make WITH_PORTS=1`, the hello program will be linked with base ncurses.

On FreeBSD, the latter shows nothing.  On DragonFly it works as expected.
Unfortunately the ncurses port is unmaintained!


----------



## marino (Oct 9, 2016)

xtaz@ okay, it seems that the testcase failed because the base ncurses headers were used to build it but it was linked with ports ncurses library.
The same thing might be going on here somewhere.


----------



## marino (Oct 9, 2016)

xtax@ (and everyone else),
I think I have this fixed.

update ports tree
rebuild devel/adacurses
rebuild ports-mgmt/synth
steps 2 and 3 can be done within synth (maybe move to text mode if you want), and then `pkg rem synth` followed by `pkg add <path/to/packages>/All/synth-1.54_1.txz`


----------



## xtaz (Oct 9, 2016)

Hi! Sorry I didn't reply until now. I missed your first post and then saw that you had reproduced it yourself on the second. I can confirm that the latest update fixes the problem. Everything is back to normal. Thanks! I had just switched it to text mode temporarily so that I could see what was going on, but it's back to ncurses mode now that it's rebuilt itself.


----------



## Uniballer (Oct 9, 2016)

I like Synth.  Good job.

Is it possible, or do you have any plans to make it possible, to cross-build ports from, say, amd64 to armv6?  What would it take to accomplish that?


----------



## marino (Oct 9, 2016)

I guess it would be possible if it mimicked poudriere (search for "qemu" here: https://github.com/freebsd/poudrier...fa9517ae2/src/share/poudriere/common.sh#L1621)
There's a guy that figured out how to make things faster here: http://phaq.phunsites.net/2015/10/1...ce-optimization-for-poudriere/comment-page-1/

It seems to me what ports really needs is to support cross-compiling natively.  Often the problem is that port generates a tool that generates a source file and that tool can't run on the host system.  It's an interesting problem.

Given how slow QEMU is anyway, though, with today's tree, there's not any benefit from using synth over poudriere for ARM cross compiling.  Is there?  Both are going to be slow compared to host building, so one might as well use poudriere for this specific use case.

gdelmatto is on the right track though, true cross compiling would be the answer.  I don't have any machines to cross-compile to ATM.


----------



## marino (Oct 9, 2016)

Uniballer said:


> I like Synth.  Good job.
> 
> Is it possible, or do you have any plans to make it possible, to cross-build ports from, say, amd64 to armv6?  What would it take to accomplish that?



Uniballer@, check out this 2-year old presentation by bapt: http://www.slideshare.net/eurobsdcon/baptiste-daroussin-crosscompiling-ports
I never saw it before, but it's really informative about what's needed for native cross compiling.  I suppose it's just a state-of-the-tree briefing because my guess is nobody is working to finish the "to do" work on it.


----------

