Discussion:
LXC, criu and cgroups...
(too old to reply)
Dirk Geschke
2015-04-05 09:31:48 UTC
Permalink
Hi all,

I was just playing with lxd and tried the move command but it
failed with

error: checkpoint failed

I think, this is not a problem of lxd but of criu with cgroups.

The running container is semi-unprivileged, I can start it without
problems.

But if I use cgmanager, then criu fails with:

Error (mount.c:624): 94:./sys/fs/cgroup/cgmanager doesn't have a proper root mount

If I do not use cgamager but mount /sys/fs/cgroup:

mount -t cgroup cgroup /sys/fs/cgroup/

I get this error:

Error (mount.c:624):
74:./sys/fs/cgroup/cpuset,cpu,cpuacct,blkio,memory,devices,freezer,net_cls,perf_event,net_prio,hugetlb,debug/lxc/wheezy2 doesn't have a proper root mount

This is with criu 1.5.1, with version 1.3.1 I get the same errors,
but at least there are *.img files created (and deleted).

Has anyone an idea what is going wrong?

It is lxd-0.5, lxc-1.1.1 (the git repository says still 1.1.0?) and
kernel 4.0-rc6 with criu-1.5.1. The kernel is adjusted to work with
criu, except:

# criu check
Warn (cr-check.c:581): Dirty tracking is OFF. Memory snapshot will not work.
Looks good.

There is a kernel patch for this, but I did not find it. I don't
think, this causes the problem.

Has anyone an idea what is my mistake?

BTW: I started the container without lxd and tried lxc-checkpoint and
got the same errors. So I think it is not related to lxd but probably
to criu?

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-06 15:00:31 UTC
Permalink
Hi Dirk,
Post by Dirk Geschke
Hi all,
I was just playing with lxd and tried the move command but it
failed with
error: checkpoint failed
I think, this is not a problem of lxd but of criu with cgroups.
The running container is semi-unprivileged, I can start it without
problems.
We currently cannot migrate unprivileged containers, so any container
you want to migrate needs to be a privileged container.
Post by Dirk Geschke
Error (mount.c:624): 94:./sys/fs/cgroup/cgmanager doesn't have a proper root mount
This should be fixed by a patch I posted to the mailing list a few
weeks ago (it finally got acked last Friday, and I think the plan is
to push it today, so you might re-try with the daily build tomorrow).

Once these are pushed, I will do a blog post on exactly the
configuration and versions of things needed to checkpoint and restore
containers with LXD.

Tycho
Post by Dirk Geschke
mount -t cgroup cgroup /sys/fs/cgroup/
74:./sys/fs/cgroup/cpuset,cpu,cpuacct,blkio,memory,devices,freezer,net_cls,perf_event,net_prio,hugetlb,debug/lxc/wheezy2 doesn't have a proper root mount
This is with criu 1.5.1, with version 1.3.1 I get the same errors,
but at least there are *.img files created (and deleted).
Has anyone an idea what is going wrong?
It is lxd-0.5, lxc-1.1.1 (the git repository says still 1.1.0?) and
kernel 4.0-rc6 with criu-1.5.1. The kernel is adjusted to work with
# criu check
Warn (cr-check.c:581): Dirty tracking is OFF. Memory snapshot will not work.
Looks good.
There is a kernel patch for this, but I did not find it. I don't
think, this causes the problem.
Has anyone an idea what is my mistake?
BTW: I started the container without lxd and tried lxc-checkpoint and
got the same errors. So I think it is not related to lxd but probably
to criu?
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-06 15:28:23 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
Hi all,
I was just playing with lxd and tried the move command but it
failed with
error: checkpoint failed
I think, this is not a problem of lxd but of criu with cgroups.
The running container is semi-unprivileged, I can start it without
problems.
We currently cannot migrate unprivileged containers, so any container
you want to migrate needs to be a privileged container.
:-/
Post by Tycho Andersen
Post by Dirk Geschke
Error (mount.c:624): 94:./sys/fs/cgroup/cgmanager doesn't have a proper root mount
This should be fixed by a patch I posted to the mailing list a few
weeks ago (it finally got acked last Friday, and I think the plan is
to push it today, so you might re-try with the daily build tomorrow).
Ah, that sounds good. I'm excited to test it.

I guess this is a patch for lxc-1.1.x and not lxd-0.5 or criu?
Post by Tycho Andersen
Once these are pushed, I will do a blog post on exactly the
configuration and versions of things needed to checkpoint and restore
containers with LXD.
Cool, I'm looking forward to read it!

Thanks for your response and enjoy the rest of the easter days!

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-06 15:31:53 UTC
Permalink
Post by Dirk Geschke
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
Hi all,
I was just playing with lxd and tried the move command but it
failed with
error: checkpoint failed
I think, this is not a problem of lxd but of criu with cgroups.
The running container is semi-unprivileged, I can start it without
problems.
We currently cannot migrate unprivileged containers, so any container
you want to migrate needs to be a privileged container.
:-/
Post by Tycho Andersen
Post by Dirk Geschke
Error (mount.c:624): 94:./sys/fs/cgroup/cgmanager doesn't have a proper root mount
This should be fixed by a patch I posted to the mailing list a few
weeks ago (it finally got acked last Friday, and I think the plan is
to push it today, so you might re-try with the daily build tomorrow).
Ah, that sounds good. I'm excited to test it.
I guess this is a patch for lxc-1.1.x and not lxd-0.5 or criu?
Yes, it's a liblxc patch (or rather, a series of patches).

Tycho
Post by Dirk Geschke
Post by Tycho Andersen
Once these are pushed, I will do a blog post on exactly the
configuration and versions of things needed to checkpoint and restore
containers with LXD.
Cool, I'm looking forward to read it!
Thanks for your response and enjoy the rest of the easter days!
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-07 10:26:42 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
Ah, that sounds good. I'm excited to test it.
I guess this is a patch for lxc-1.1.x and not lxd-0.5 or criu?
Yes, it's a liblxc patch (or rather, a series of patches).
hmm, I think this patch is now part of lxc, at least a git pull
installed a lot of files and the log mentions things like:

c/r: teach criu about cgmanager's socket

But it still fails. An lxc-checkpoint of an unprivileged container
results in

Error (mount.c:636): 141:./sys/fs/cgroup/cgmanager is overmounted

The same is for lxd and move command, with cgmanager I get:

Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Error (mount.c:636): 92:./sys/fs/cgroup/cgmanager is overmounted

and if I mount cgroup on /sys/fs/cgroup:

Error (mount.c:624): 74:./sys/fs/cgroup/cpuset,cpu,cpuacct,blkio,memory,devices,freezer,net_cls,perf_event,net_prio,hugetlb,debug/lxc/ubuntix doesn' t have a proper root mount

BTW: What is the right one with lxd? Using cgmanager oder direct mount
cgroup to /sys/fs/cgroup?

If I enable Memory Tracking with kernel 4.0.0-rc6 I get now:

# criu check
Looks good.

Do I miss something?

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-07 13:37:26 UTC
Permalink
Hi Dirk,
Post by Dirk Geschke
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
Ah, that sounds good. I'm excited to test it.
I guess this is a patch for lxc-1.1.x and not lxd-0.5 or criu?
Yes, it's a liblxc patch (or rather, a series of patches).
hmm, I think this patch is now part of lxc, at least a git pull
c/r: teach criu about cgmanager's socket
But it still fails. An lxc-checkpoint of an unprivileged container
results in
Checkpointing of unprivileged containers won't work right now. Do you
have the same problems with privileged containers?

Tycho
Post by Dirk Geschke
Error (mount.c:636): 141:./sys/fs/cgroup/cgmanager is overmounted
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Error (mount.c:636): 92:./sys/fs/cgroup/cgmanager is overmounted
Error (mount.c:624): 74:./sys/fs/cgroup/cpuset,cpu,cpuacct,blkio,memory,devices,freezer,net_cls,perf_event,net_prio,hugetlb,debug/lxc/ubuntix doesn' t have a proper root mount
BTW: What is the right one with lxd? Using cgmanager oder direct mount
cgroup to /sys/fs/cgroup?
# criu check
Looks good.
Do I miss something?
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-07 14:33:07 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
hmm, I think this patch is now part of lxc, at least a git pull
c/r: teach criu about cgmanager's socket
But it still fails. An lxc-checkpoint of an unprivileged container
results in
Checkpointing of unprivileged containers won't work right now. Do you
have the same problems with privileged containers?
lxc is running as root but with subuid and subgid. If I remove
the subuid/subgid mappings, I can't launch a container:

Creating container...error: shared's user has no subuids

And lxd says:

2015/04/07 16:31:38 error reading idmap: User "root" has no subuids.
2015/04/07 16:31:38 operations requiring idmap will not be available

So how do I start a privileged container with lxd?

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Dirk Geschke
2015-04-07 19:03:30 UTC
Permalink
Hi Tycho,

I'm now one step ahead, I head a file 00-lxcfs.conf from playing
around with lxcfs. But this one is not needed with lxc-1.1.x
As a consequence, cgroup was bind-mounted twice in the container.

Now at least lxc-checkpoint does not have a problem with the
mounts of cgroup. Now I end up with:

Command "save" is unknown, try "ip addr help".
Error (util.c:580): exited, status=255
Error (net.c:420): IP tool failed on addr save
Error (namespaces.c:802): Namespaces dumping finished with error 65280
Error (cr-dump.c:1979): Dumping FAILED.

Sounds like ip is missing the save option for addr. Maybe the
namespace-error is a follow up.

Ah, with a recompiled package iproute2 from jessie on wheezy this
seems to work.

So I can now checkpoint an unprivileged container on debian wheezy
with lxc-checkpoint. But still no luck with LXD so far, it is still
complaining:

Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Error (mount.c:636): 92:./sys/fs/cgroup/cgmanager is overmounted

And there is no 00-lxcfs.conf on this system. In the container I
can see two cgroup mounts:

cgroup 4 0 4 0% /sys/fs/cgroup
cgroup 12 0 12 0% /sys/fs/cgroup/cgmanager

This is similar to the non-LXD case.

So actually I'm a little bit clueless here...

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-07 19:20:39 UTC
Permalink
Hi Dirk,
Post by Dirk Geschke
Hi Tycho,
I'm now one step ahead, I head a file 00-lxcfs.conf from playing
around with lxcfs. But this one is not needed with lxc-1.1.x
As a consequence, cgroup was bind-mounted twice in the container.
Now at least lxc-checkpoint does not have a problem with the
Command "save" is unknown, try "ip addr help".
Error (util.c:580): exited, status=255
Error (net.c:420): IP tool failed on addr save
Error (namespaces.c:802): Namespaces dumping finished with error 65280
Error (cr-dump.c:1979): Dumping FAILED.
Sounds like ip is missing the save option for addr. Maybe the
namespace-error is a follow up.
Ah, with a recompiled package iproute2 from jessie on wheezy this
seems to work.
So I can now checkpoint an unprivileged container on debian wheezy
with lxc-checkpoint. But still no luck with LXD so far, it is still
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Error (mount.c:636): 92:./sys/fs/cgroup/cgmanager is overmounted
I guess you have lxcfs installed? We currently can't checkpoint
containers with lxcfs, exactly because of this overmounting issue. If
you `sudo apt-get remove lxcfs` and restart the container, does
checkpointing work?

Tycho
Post by Dirk Geschke
And there is no 00-lxcfs.conf on this system. In the container I
cgroup 4 0 4 0% /sys/fs/cgroup
cgroup 12 0 12 0% /sys/fs/cgroup/cgmanager
This is similar to the non-LXD case.
So actually I'm a little bit clueless here...
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-07 19:40:26 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
So I can now checkpoint an unprivileged container on debian wheezy
with lxc-checkpoint. But still no luck with LXD so far, it is still
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Error (mount.c:636): 92:./sys/fs/cgroup/cgmanager is overmounted
I guess you have lxcfs installed? We currently can't checkpoint
containers with lxcfs, exactly because of this overmounting issue. If
you `sudo apt-get remove lxcfs` and restart the container, does
checkpointing work?
hmm, I did a "make uninstall" in the lxcfs directory and after a
reboot, I get a step further: Now I find a lot of *img files on
the one host in /var/tmp/lxd_migration_610963748 (I've set TMPDIR
to /var/tmp) and an empty diretory /var/tmp/lxd_migration_727837215
on the target host. But the move command hungs, nothing more seems
to happen, except I have now a STOPPED container ubuntix on both
systems.

Maybe some kind of dead lock? Although, lxd is still respondig
to requests on both machines.

I will abort it and try once more, no luck again. But there are
sockets like these /var/tmp/lxd_rsync_629730355 on the source host
but not on the target host. So maybe this is an rsync problem?

Do you have an idea, how to debug this?

Best regards

Dirk

PS: I was not aware, that simply installing lxcfs will start it,
I did not even see it running. But probably it installs some
lxc hooks for the bind mounts...
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-07 19:44:53 UTC
Permalink
Post by Dirk Geschke
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
So I can now checkpoint an unprivileged container on debian wheezy
with lxc-checkpoint. But still no luck with LXD so far, it is still
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Found cgmanager mapping for ./sys/fs/cgroup/cgmanager mountpoint
Error (mount.c:636): 92:./sys/fs/cgroup/cgmanager is overmounted
I guess you have lxcfs installed? We currently can't checkpoint
containers with lxcfs, exactly because of this overmounting issue. If
you `sudo apt-get remove lxcfs` and restart the container, does
checkpointing work?
hmm, I did a "make uninstall" in the lxcfs directory and after a
reboot, I get a step further: Now I find a lot of *img files on
the one host in /var/tmp/lxd_migration_610963748 (I've set TMPDIR
to /var/tmp) and an empty diretory /var/tmp/lxd_migration_727837215
on the target host. But the move command hungs, nothing more seems
to happen, except I have now a STOPPED container ubuntix on both
systems.
Can you paste the --debug logs from lxd? Also, the output of `ps auxf`
on the target host might be instructive as well.
Post by Dirk Geschke
Maybe some kind of dead lock? Although, lxd is still respondig
to requests on both machines.
I will abort it and try once more, no luck again. But there are
sockets like these /var/tmp/lxd_rsync_629730355 on the source host
but not on the target host. So maybe this is an rsync problem?
Do you have an idea, how to debug this?
Best regards
Dirk
PS: I was not aware, that simply installing lxcfs will start it,
I did not even see it running. But probably it installs some
lxc hooks for the bind mounts...
Yep, that's exactly right. Unfortunately, there are a couple of
problems with lxcfs: criu doesn't understand overmounting, and criu
doesn't understand how to dump fuse filesystems. So for now, we can't
c/r containers with lxcfs.

Tycho
Post by Dirk Geschke
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-07 20:11:18 UTC
Permalink
Hi Tycho,

I'm one step further again. rsync seems to use nc -U, but this
option is not part of netcat.traditional, but netcat.openbsd
knows it. So with this packages I'm one step ahead.

I guess, one should react on errors if rsync fails...

Now it stops at:

2015/04/07 21:51:45 client cert != key for 192.168.1.233
2015/04/07 21:51:45 allowing untrusted GET to /1.0/operations/70d96f6d-5dd0-49dc-b86c-c5e33702a01d/websocket?secret=LhfAenvHyoM2p4ycwkvYdc5pU6cX9XuLIh%2FLr8b6luTEkBqn4gjf81KEnbbEKXLdLMxGWKTbBQaPTaP8hwMPAx89rLI0FQLDK%2FfvX45RlcvbUHRuMGfW3PH9ijRmdLOEzHhm8g%3D%3D
2015/04/07 21:51:47 got error getting next reader websocket: close 1005, &{{%!s(*net.netFD=&{{10 0 0} 15 1 1 false unix 0xc2080e1b60 0xc2080e1b80 {139714654178008}})}}

The first two messages are irritating, but it seems to work. On
the target host, I can find the img-files in /var/tmp/lxd_migration_...

But then, nothting else. After a reboot of the source host, nothing
changes. On the source host I see again:

2015/04/07 22:06:51 got error getting next reader websocket: close 1005 , &{{%!s(*net.netFD=&{{10 0 0} 14 1 1 false unix 0xc20811d780 0xc20811d7a0 {140705565387480}})}}

and on the target system:

2015/04/07 22:06:51 got error getting next reader read tcp 192.168.1.233:8443: use of closed network connection, &{%!s(*os.File=&{0xc2080d6ae0}) {{%!s(int32=0) %!s(uint32=0)} %!s(uint32=0)} <nil>}

strange...
Post by Tycho Andersen
Post by Dirk Geschke
PS: I was not aware, that simply installing lxcfs will start it,
I did not even see it running. But probably it installs some
lxc hooks for the bind mounts...
Yep, that's exactly right. Unfortunately, there are a couple of
problems with lxcfs: criu doesn't understand overmounting, and criu
doesn't understand how to dump fuse filesystems. So for now, we can't
c/r containers with lxcfs.
Ah, okay, systemd in a container seems to be no good idea at all.
It takes multiple times more to start then plain old SysVInit...

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-07 20:22:44 UTC
Permalink
Post by Dirk Geschke
Hi Tycho,
I'm one step further again. rsync seems to use nc -U, but this
option is not part of netcat.traditional, but netcat.openbsd
knows it. So with this packages I'm one step ahead.
Ah, actually it is LXD's rsync wrapper that is doing the nc -U, sounds
like we should stick a depends on netcat-openbsd in the packaging.
Thanks for pointing that out.
Post by Dirk Geschke
I guess, one should react on errors if rsync fails...
2015/04/07 21:51:45 client cert != key for 192.168.1.233
2015/04/07 21:51:45 allowing untrusted GET to /1.0/operations/70d96f6d-5dd0-49dc-b86c-c5e33702a01d/websocket?secret=LhfAenvHyoM2p4ycwkvYdc5pU6cX9XuLIh%2FLr8b6luTEkBqn4gjf81KEnbbEKXLdLMxGWKTbBQaPTaP8hwMPAx89rLI0FQLDK%2FfvX45RlcvbUHRuMGfW3PH9ijRmdLOEzHhm8g%3D%3D
2015/04/07 21:51:47 got error getting next reader websocket: close 1005, &{{%!s(*net.netFD=&{{10 0 0} 15 1 1 false unix 0xc2080e1b60 0xc2080e1b80 {139714654178008}})}}
The first two messages are irritating, but it seems to work. On
the target host, I can find the img-files in /var/tmp/lxd_migration_...
But then, nothting else. After a reboot of the source host, nothing
2015/04/07 22:06:51 got error getting next reader websocket: close 1005 , &{{%!s(*net.netFD=&{{10 0 0} 14 1 1 false unix 0xc20811d780 0xc20811d7a0 {140705565387480}})}}
2015/04/07 22:06:51 got error getting next reader read tcp 192.168.1.233:8443: use of closed network connection, &{%!s(*os.File=&{0xc2080d6ae0}) {{%!s(int32=0) %!s(uint32=0)} %!s(uint32=0)} <nil>}
strange...
Can you show the output of `ps auxf`? Is it still hung rsyncing
something?

Tycho
Post by Dirk Geschke
Post by Tycho Andersen
Post by Dirk Geschke
PS: I was not aware, that simply installing lxcfs will start it,
I did not even see it running. But probably it installs some
lxc hooks for the bind mounts...
Yep, that's exactly right. Unfortunately, there are a couple of
problems with lxcfs: criu doesn't understand overmounting, and criu
doesn't understand how to dump fuse filesystems. So for now, we can't
c/r containers with lxcfs.
Ah, okay, systemd in a container seems to be no good idea at all.
It takes multiple times more to start then plain old SysVInit...
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-07 20:30:39 UTC
Permalink
Hi Tycho,

the process list is interesting. On the source host I see:

root 4653 0.7 0.6 93844 14248 pts/1 Sl+ 22:06 0:06 lxd --debug --group lxcuser --tcp 192.168.1.233:8443
root 4725 1.8 0.5 108416 12196 pts/1 S+ 22:06 0:16 rsync -arvPz --devices --partial /var/lib/lxd/lxc/ubuntix/rootfs/ localhost:/tmp/foo -e sh -c "nc -U /var/tmp/lxd_rsync_915745248"
root 4727 0.0 0.0 4192 508 pts/1 S+ 22:06 0:00 sh -c nc -U /var/tmp/lxd_rsync_915745248 localhost rsync --server -vlogDtprze.iLsf --partial . /tmp/foo
root 4728 0.0 0.0 8724 584 pts/1 S+ 22:06 0:00 nc -U /var/tmp/lxd_rsync_915745248

and on the target host:

root 13155 1.0 0.4 109140 14180 pts/1 Sl+ 22:06 0:10 lxd --debug --group lxcuser --tcp 192.168.1.234:8443
root 13234 0.0 0.1 78268 3820 pts/1 S+ 22:06 0:00 rsync --server -vlogDtprze.iLsfx --devices --partial . /var/lib/lxd/lxc/ubuntix/rootfs/
root 13235 0.3 0.1 100140 3748 pts/1 S+ 22:06 0:03 rsync --server -vlogDtprze.iLsfx --devices --partial . /var/lib/lxd/lxc/ubuntix/rootfs/

And this hungs now for several minutes.

I'm not sure, but maybe it is related to the rowhammer fix?

Apr 7 22:06:04 otto kernel: [ 205.054216] The pagemap bits 55-60 has changed their meaning! See the linux/Documentation/vm/pagemap.txt for details.
Apr 7 22:06:04 otto kernel: [ 205.054269] Bits 55-60 of /proc/PID/pagemap entries are about to stop being page-shift some time soon. See the linux/Documentation/vm/pagemap.txt for details.

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-07 20:34:06 UTC
Permalink
Post by Dirk Geschke
Hi Tycho,
root 4653 0.7 0.6 93844 14248 pts/1 Sl+ 22:06 0:06 lxd --debug --group lxcuser --tcp 192.168.1.233:8443
root 4725 1.8 0.5 108416 12196 pts/1 S+ 22:06 0:16 rsync -arvPz --devices --partial /var/lib/lxd/lxc/ubuntix/rootfs/ localhost:/tmp/foo -e sh -c "nc -U /var/tmp/lxd_rsync_915745248"
root 4727 0.0 0.0 4192 508 pts/1 S+ 22:06 0:00 sh -c nc -U /var/tmp/lxd_rsync_915745248 localhost rsync --server -vlogDtprze.iLsf --partial . /tmp/foo
root 4728 0.0 0.0 8724 584 pts/1 S+ 22:06 0:00 nc -U /var/tmp/lxd_rsync_915745248
root 13155 1.0 0.4 109140 14180 pts/1 Sl+ 22:06 0:10 lxd --debug --group lxcuser --tcp 192.168.1.234:8443
root 13234 0.0 0.1 78268 3820 pts/1 S+ 22:06 0:00 rsync --server -vlogDtprze.iLsfx --devices --partial . /var/lib/lxd/lxc/ubuntix/rootfs/
root 13235 0.3 0.1 100140 3748 pts/1 S+ 22:06 0:03 rsync --server -vlogDtprze.iLsfx --devices --partial . /var/lib/lxd/lxc/ubuntix/rootfs/
And this hungs now for several minutes.
Is it doing i/o? Looks to me like it's (trying) to send the rootfs.
Adding the f argument to ps (i.e. something like `ps auxf`) will show
it in tree form, so you can figure out which rsync server corresponds
to which lxd easily.
Post by Dirk Geschke
I'm not sure, but maybe it is related to the rowhammer fix?
Apr 7 22:06:04 otto kernel: [ 205.054216] The pagemap bits 55-60 has changed their meaning! See the linux/Documentation/vm/pagemap.txt for details.
Apr 7 22:06:04 otto kernel: [ 205.054269] Bits 55-60 of /proc/PID/pagemap entries are about to stop being page-shift some time soon. See the linux/Documentation/vm/pagemap.txt for details.
I wouldn't think so. It looks like rsync is just hung (or sending
things really slowly).

Tycho
Post by Dirk Geschke
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-07 21:02:42 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
Is it doing i/o? Looks to me like it's (trying) to send the rootfs.
Adding the f argument to ps (i.e. something like `ps auxf`) will show
it in tree form, so you can figure out which rsync server corresponds
to which lxd easily.
the processes followed the PIDs, and no, there was no traffic.

I killed the rsync processs, restarted it all and now I got further.
The error messages are still there, but traffic was send to port 8443
across the network. But finally I got a second error.

On the source host:

2015/04/07 22:49:29 got error getting next reader websocket: close 1005 , &{{%!s(*net.netFD=&{{10 0 0} 18 1 1 false unix 0xc2080e39e0 0xc2080e3a00 {140705565387096}})}}
2015/04/07 22:50:24 got error getting next reader websocket: close 1005 , &{{%!s(*net.netFD=&{{10 0 0} 18 1 1 false unix 0xc2080e2720 0xc2080e2740 {140705565387288}})}}
2015/04/07 22:50:25 operation %!s(func() shared.OperationResult=0x4d0120) finished: { restore failed}

and on the target host:

2015/04/07 22:49:29 got error getting next reader websocket: close 1005 , &{%!s(*os.File=&{0xc20809f860}) {{%!s(int32=0) %!s(uint32=0)} %!s(uint32=1)} <nil>}
2015/04/07 22:50:24 got error getting next reader websocket: close 1005 , &{%!s(*os.File=&{0xc20809e840}) {{%!s(int32=0) %!s(uint32=0)} %!s(uint32=1)} <nil>}
2015/04/07 22:50:26 operation %!s(func() shared.OperationResult=0x4ccba0) finished: { restore failed}

But now the move command does fail with an error:

***@karl:~$ lxc move otto:ubuntix local:ubuntix
error: restore failed

And now I have a migration-restore log, it ends up with:

(00.004890) Warn (cr-restore.c:1016): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!
(00.004899) Forking task with 1 pid (flags 0x7c028000)
(00.004919) Saved netns fd for links restore
(00.005335) Wait until namespaces are created
(00.006008) UNS: Daemon started
(00.006873) Running setup-namespaces scripts
(00.006900) [/usr/local/share/lxc/lxc-restore-net]
(00.027303) 1: Restoring namespaces 1 flags 0x7c028000
(00.027375) 1: Error (image.c:255): Unable to open netdev-8.img: Permission denied
(00.040629) UNS: calling 0x456140 (-1, 1)
(00.040671) UNS: daemon calls 0x456140 (-1, 1)
(00.040681) UNS: `- daemon exits w/ 0
(00.040938) UNS: daemon stopped
(00.040946) Error (cr-restore.c:1879): Restoring FAILED.

So it looks here like a problem wiht lxc-restore-net. But why is
it not able to read netdev-8.img? The file is cleaned up, but
some old, failing syncs show that everyone can read it?

-rw-r--r-- 1 root root 53 Apr 7 21:51 lxd_migration_231938523/netdev-8.img

But we are stepping ahead...

Any ideas how to debug this? Hmm, lxc-restore-net seems to be
a shell script...

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-07 21:26:27 UTC
Permalink
Post by Dirk Geschke
Hi Tycho,
Post by Tycho Andersen
Is it doing i/o? Looks to me like it's (trying) to send the rootfs.
Adding the f argument to ps (i.e. something like `ps auxf`) will show
it in tree form, so you can figure out which rsync server corresponds
to which lxd easily.
the processes followed the PIDs, and no, there was no traffic.
I killed the rsync processs, restarted it all and now I got further.
The error messages are still there, but traffic was send to port 8443
across the network. But finally I got a second error.
2015/04/07 22:49:29 got error getting next reader websocket: close 1005 , &{{%!s(*net.netFD=&{{10 0 0} 18 1 1 false unix 0xc2080e39e0 0xc2080e3a00 {140705565387096}})}}
2015/04/07 22:50:24 got error getting next reader websocket: close 1005 , &{{%!s(*net.netFD=&{{10 0 0} 18 1 1 false unix 0xc2080e2720 0xc2080e2740 {140705565387288}})}}
2015/04/07 22:50:25 operation %!s(func() shared.OperationResult=0x4d0120) finished: { restore failed}
2015/04/07 22:49:29 got error getting next reader websocket: close 1005 , &{%!s(*os.File=&{0xc20809f860}) {{%!s(int32=0) %!s(uint32=0)} %!s(uint32=1)} <nil>}
2015/04/07 22:50:24 got error getting next reader websocket: close 1005 , &{%!s(*os.File=&{0xc20809e840}) {{%!s(int32=0) %!s(uint32=0)} %!s(uint32=1)} <nil>}
2015/04/07 22:50:26 operation %!s(func() shared.OperationResult=0x4ccba0) finished: { restore failed}
error: restore failed
(00.004890) Warn (cr-restore.c:1016): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!
(00.004899) Forking task with 1 pid (flags 0x7c028000)
(00.004919) Saved netns fd for links restore
(00.005335) Wait until namespaces are created
(00.006008) UNS: Daemon started
(00.006873) Running setup-namespaces scripts
(00.006900) [/usr/local/share/lxc/lxc-restore-net]
(00.027303) 1: Restoring namespaces 1 flags 0x7c028000
(00.027375) 1: Error (image.c:255): Unable to open netdev-8.img: Permission denied
(00.040629) UNS: calling 0x456140 (-1, 1)
(00.040671) UNS: daemon calls 0x456140 (-1, 1)
(00.040681) UNS: `- daemon exits w/ 0
(00.040938) UNS: daemon stopped
(00.040946) Error (cr-restore.c:1879): Restoring FAILED.
So it looks here like a problem wiht lxc-restore-net. But why is
it not able to read netdev-8.img? The file is cleaned up, but
some old, failing syncs show that everyone can read it?
This means you're still trying to c/r unprivileged containers, which
won't work. You need to set security.privileged in the
container config:

lxc config set <container> security.privileged true

Tycho
Post by Dirk Geschke
-rw-r--r-- 1 root root 53 Apr 7 21:51 lxd_migration_231938523/netdev-8.img
But we are stepping ahead...
Any ideas how to debug this? Hmm, lxc-restore-net seems to be
a shell script...
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-07 21:37:47 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
So it looks here like a problem wiht lxc-restore-net. But why is
it not able to read netdev-8.img? The file is cleaned up, but
some old, failing syncs show that everyone can read it?
This means you're still trying to c/r unprivileged containers, which
won't work. You need to set security.privileged in the
lxc config set <container> security.privileged true
ah, ok, I did not know this setting so far. I will give it a
try tomorrow. But at least, I'm sure unprivileged may work soon,
too. There seems like only some steps are missing...

Maybe one has to move the img-files to the namespace first?

Or one should simply skip the tests for the right flags in criu?

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-07 21:40:46 UTC
Permalink
Post by Dirk Geschke
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
So it looks here like a problem wiht lxc-restore-net. But why is
it not able to read netdev-8.img? The file is cleaned up, but
some old, failing syncs show that everyone can read it?
This means you're still trying to c/r unprivileged containers, which
won't work. You need to set security.privileged in the
lxc config set <container> security.privileged true
ah, ok, I did not know this setting so far. I will give it a
try tomorrow. But at least, I'm sure unprivileged may work soon,
too. There seems like only some steps are missing...
Maybe one has to move the img-files to the namespace first?
There is a lot of work to be done to support this (both in the kernel
and in criu): http://criu.org/User_namespace

It will work eventually, but for now it doesn't.

Tycho
Post by Dirk Geschke
Or one should simply skip the tests for the right flags in criu?
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-08 09:15:49 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
This means you're still trying to c/r unprivileged containers, which
won't work. You need to set security.privileged in the
lxc config set <container> security.privileged true
just tested it this way and this seems to work! I had some problems
with the now privieleged container due to the old subuid/subgid
mapping. But I can move this container around the hosts. It takes
a few seconds, but it works.

Now I dream of this feature for an unprivileged container...

One whish so far: Can you file a feature request for lxd to check

+ right version of nc

+ result of rsync

+ more verbosity in debug mode?

I found the failing rsync in an strace, but lxd did not react on
this. I suspect, that even a missing nc would result in a hanging
lxd process, too? This would prevent others to run in the same
problems as I did...

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-08 14:47:52 UTC
Permalink
Post by Dirk Geschke
Hi Tycho,
Post by Tycho Andersen
This means you're still trying to c/r unprivileged containers, which
won't work. You need to set security.privileged in the
lxc config set <container> security.privileged true
just tested it this way and this seems to work! I had some problems
with the now privieleged container due to the old subuid/subgid
mapping. But I can move this container around the hosts. It takes
a few seconds, but it works.
Now I dream of this feature for an unprivileged container...
One whish so far: Can you file a feature request for lxd to check
+ right version of nc
I made a change to the packaging to depend on the right version so
hopefully that will be sorted.
Post by Dirk Geschke
+ result of rsync
We do check the result of rsync,

https://github.com/lxc/lxd/blob/master/lxd/migration/rsync.go#L31
https://github.com/lxc/lxd/blob/master/lxd/migration/rsync.go#L115

However in your case it looked like rsync was simply hanging. There's
not really much we can do to detect a situation like that,
unfortunately.
Post by Dirk Geschke
+ more verbosity in debug mode?
This we could do (at least print "fs sync success" and "images sync
success" or something).
Post by Dirk Geschke
I found the failing rsync in an strace, but lxd did not react on
this. I suspect, that even a missing nc would result in a hanging
lxd process, too? This would prevent others to run in the same
problems as I did...
Can you paste the strace? I'd be curious to see what was happening,
since as far as I can see our code should notice when rsync fails
(exits nonzero).

Tycho
Post by Dirk Geschke
Best regards
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-08 20:42:31 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
+ more verbosity in debug mode?
This we could do (at least print "fs sync success" and "images sync
success" or something).
or the background commands wich are executed, like criu, nc, rsync?
Post by Tycho Andersen
Post by Dirk Geschke
I found the failing rsync in an strace, but lxd did not react on
this. I suspect, that even a missing nc would result in a hanging
lxd process, too? This would prevent others to run in the same
problems as I did...
Can you paste the strace? I'd be curious to see what was happening,
since as far as I can see our code should notice when rsync fails
(exits nonzero).
No, I have overwritten it with a new one. But if I remember right,
I started at the end, there was an error with rsync. A few lines
above, I've found the netcat error message complaining about the
unknown -U flag...

The netcat.openbsd fixed the problem, but there were error messages
before without any notification about it...

Best regardss

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Dirk Geschke
2015-04-07 21:33:50 UTC
Permalink
Hi Tycho,

is there a way to keep the files instead of removing them?

I can see the img-files transfered to /var/tmp/lxd_migration...

Then I see it is creating the rootfs and filling it with files.
But then it fails and all is removed :-/

I tried it a third time, changed to /var/lib/lxd/lxc/ubuntix/rootfs/
and now it worked somehow?

At least, the container is now on the target system, but not
running:

***@karl:~$ lxc list
+---------+---------+------+------+
| NAME | STATE | IPV4 | IPV6 |
+---------+---------+------+------+
| ubuntix | STOPPED | | |
+---------+---------+------+------+

And the log file complains about cgmanager:

lxc 1428442166.209 WARN lxc_cgmanager - cgmanager.c:cgm_get:963 - do_cgm_get exited with error

If I start it manually, the container seems to run now.

Strange...

Ok, I will shut down the systems and try it again tomorrow...

Best regards and many thanks for your help

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Tycho Andersen
2015-04-07 21:38:07 UTC
Permalink
Post by Dirk Geschke
Hi Tycho,
is there a way to keep the files instead of removing them?
I can see the img-files transfered to /var/tmp/lxd_migration...
Then I see it is creating the rootfs and filling it with files.
But then it fails and all is removed :-/
Right, when it fails it tries to clean up after itself. I suppose we
could add a mode where it didn't do that, but I'd have to be convinced
:)
Post by Dirk Geschke
I tried it a third time, changed to /var/lib/lxd/lxc/ubuntix/rootfs/
and now it worked somehow?
Sorry, what changed to /var...?
Post by Dirk Geschke
At least, the container is now on the target system, but not
+---------+---------+------+------+
| NAME | STATE | IPV4 | IPV6 |
+---------+---------+------+------+
| ubuntix | STOPPED | | |
+---------+---------+------+------+
So the restore failed and the target but it lxd didn't report this?
Can you paste the restore.log and the full debug output?
Post by Dirk Geschke
lxc 1428442166.209 WARN lxc_cgmanager - cgmanager.c:cgm_get:963 - do_cgm_get exited with error
The full log would be useful, thanks.

Tycho
Post by Dirk Geschke
If I start it manually, the container seems to run now.
Strange...
Ok, I will shut down the systems and try it again tomorrow...
Best regards and many thanks for your help
Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
+----------------------------------------------------------------------+
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Dirk Geschke
2015-04-07 21:50:46 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
I tried it a third time, changed to /var/lib/lxd/lxc/ubuntix/rootfs/
and now it worked somehow?
Sorry, what changed to /var...?
ah, sorry: my shell on this system...
Post by Tycho Andersen
Post by Dirk Geschke
At least, the container is now on the target system, but not
+---------+---------+------+------+
| NAME | STATE | IPV4 | IPV6 |
+---------+---------+------+------+
| ubuntix | STOPPED | | |
+---------+---------+------+------+
So the restore failed and the target but it lxd didn't report this?
no it didn't fail now. But the container is not running after the
move command.
Post by Tycho Andersen
Can you paste the restore.log and the full debug output?
hmm, just shut down the system. But there was no log file of
the restore in /var/log/lxd/unbuntix, only the log file of lxd

This is what actually is on my display for the target system:

2015/04/07 23:29:10 operation %!s(func() shared.OperationResult=0x4ccba0) finished: { <nil>}
2015/04/07 23:29:26 handling GET /1.0
2015/04/07 23:29:26 handling GET /1.0/containers
2015/04/07 23:29:26 handling GET /1.0/containers/ubuntix
2015/04/07 23:29:26 applying raw.lxc: lxc.tty=0
lxc.console=none
lxc.cgroup.devices.deny=c 5:1 rwm
2015/04/07 23:29:26 Configured device eth0
2015/04/07 23:29:26 handling GET /1.0/containers/ubuntix/snapshots
2015/04/07 23:32:36 handling GET /1.0
2015/04/07 23:32:36 handling PUT /1.0/containers/ubuntix/state
2015/04/07 23:32:36 applying raw.lxc: lxc.tty=0
lxc.console=none
lxc.cgroup.devices.deny=c 5:1 rwm
2015/04/07 23:32:36 Configured device eth0
2015/04/07 23:32:36 handling GET
/1.0/operations/1f4329ca-7fac-4d61-8797-bc7bd1e2edcc/wait
2015/04/07 23:32:36 operation %!s(func()
shared.OperationResult=0x56a290) finished: { <nil>}
2015/04/07 23:32:39 handling GET /1.0
2015/04/07 23:32:39 handling GET /1.0/containers
2015/04/07 23:32:39 handling GET /1.0/containers/ubuntix
2015/04/07 23:32:39 applying raw.lxc: lxc.tty=0
lxc.console=none
lxc.cgroup.devices.deny=c 5:1 rwm
2015/04/07 23:32:39 Configured device eth0
2015/04/07 23:32:39 handling GET /1.0/containers/ubuntix/snapshots
2015/04/07 23:34:43 handling GET /1.0
2015/04/07 23:34:43 handling GET /1.0/containers/ubuntix
2015/04/07 23:34:43 applying raw.lxc: lxc.tty=0
lxc.console=none
lxc.cgroup.devices.deny=c 5:1 rwm
2015/04/07 23:34:43 Configured device eth0

and this for the source system:

2015/04/07 23:29:10 operation %!s(func()
shared.OperationResult=0x4d0120) finished: { <nil>}
2015/04/07 23:29:12 found cert for 192.168.1.233
2015/04/07 23:29:12 handling GET /1.0
2015/04/07 23:29:12 found cert for 192.168.1.233
2015/04/07 23:29:12 found cert for 192.168.1.233
2015/04/07 23:29:12 handling GET /1.0/containers/ubuntix
2015/04/07 23:29:12 applying raw.lxc: lxc.tty=0
lxc.console=none
lxc.cgroup.devices.deny=c 5:1 rwm
2015/04/07 23:29:12 Configured device eth0
2015/04/07 23:29:12 found cert for 192.168.1.233
2015/04/07 23:29:12 handling DELETE /1.0/containers/ubuntix
2015/04/07 23:29:12 found cert for 192.168.1.233
2015/04/07 23:29:12 handling GET
/1.0/operations/df5b2dff-b433-4954-904b-68d15c233f53/wait
2015/04/07 23:29:13 operation %!s(func()
shared.OperationResult=0x56a290) finished: { <nil>}
2015/04/07 23:30:45 handling GET /1.0
2015/04/07 23:30:45 handling GET /1.0/containers
2015/04/07 23:30:45 handling GET /1.0/containers/wheezy1
2015/04/07 23:30:45 applying raw.lxc: lxc.tty=0
lxc.console=none
lxc.cgroup.devices.deny=c 5:1 rwm
2015/04/07 23:30:45 Configured device eth0
2015/04/07 23:30:45 handling GET /1.0/containers/wheezy2
2015/04/07 23:30:45 applying raw.lxc: lxc.tty=0
lxc.console=none
lxc.cgroup.devices.deny=c 5:1 rwm
2015/04/07 23:30:45 Configured device eth0

The last messages on the target system resulted from a

lxc start ubuntix

And on the source machine there was an lxc list command to check,
that ubuntix was really moved.
Post by Tycho Andersen
Post by Dirk Geschke
lxc 1428442166.209 WARN lxc_cgmanager - cgmanager.c:cgm_get:963 - do_cgm_get exited with error
The full log would be useful, thanks.
ok, I restart the machine, the lxc.log is now attached.

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Dirk Geschke
2015-04-08 08:53:02 UTC
Permalink
Hi Tycho,
Post by Tycho Andersen
Post by Dirk Geschke
I tried it a third time, changed to /var/lib/lxd/lxc/ubuntix/rootfs/
and now it worked somehow?
Sorry, what changed to /var...?
I just tested it again. There seems to be a race condition,
sometimes it failes, sometimes it works.

Whereas works means: It gets stopped on the source host, is
moved to the target hosts and remains there in status stopped.

So I think there are two problems:

1. moving via lxd, there seems to be a race condition failing
around every second time

2. restoring the unprivileged container with criu. But as you
already mentioned, it is not supported yet.

Now I will start to see, if I have more luck with a privileged
container with migration...

Best regards

Dirk
--
+----------------------------------------------------------------------+
| Dr. Dirk Geschke / Plankensteinweg 61 / 85435 Erding |
| Telefon: 08122-559448 / Mobil: 0176-96906350 / Fax: 08122-9818106 |
| ***@geschke-online.de / ***@lug-erding.de / ***@lug-erding.de |
+----------------------------------------------------------------------+
Continue reading on narkive:
Loading...