Discussion:
[lxc-users] Unprivileged containers and Linux Capabilities
Michele Giacomoli
2016-05-17 08:32:18 UTC
Permalink
HI all,

I have an Ubuntu 14.04 host with lxc 1.0.3-0ubuntu3. I created an
unprivileged container with the following capabilities dropped from
/usr/share/lxc/config/ubuntu.common.conf template:
lxc.cap.drop = sys_module mac_admin mac_override sys_time
This is the configuration for the container:

lxc.include = /usr/share/lxc/config/ubuntu.common.conf
lxc.include = /usr/share/lxc/config/ubuntu.userns.conf
lxc.arch = x86_64

lxc.id_map = u 0 123456 65536
lxc.id_map = g 0 123456 65536
lxc.rootfs = /mypath/
lxc.utsname = mycontainer

# Network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = mylink
lxc.network.name = eth0
lxc.network.hwaddr = my:ma:ca:dd:re:ss

A really basic config file

I installed a program inside this container which claims it fails when
calling function pthread_setschedparam. This function should be
permitted when CAP_SYS_NICE capability is not dropped (and this seems to
be the case). I also had same problem in the past when trying to let a
guest change system clock (that time I removed sys_time from dropped
capabilities).
My questions are: are capabilities taken in consideration when dealing
with unprivileged containers? Do I have something more to do so that I
can use this functions inside an unprivileged container?

Best Regards
Michele
Serge E. Hallyn
2016-05-17 14:43:51 UTC
Permalink
Post by Michele Giacomoli
HI all,
I have an Ubuntu 14.04 host with lxc 1.0.3-0ubuntu3. I created an
unprivileged container with the following capabilities dropped from
lxc.cap.drop = sys_module mac_admin mac_override sys_time
lxc.include = /usr/share/lxc/config/ubuntu.common.conf
lxc.include = /usr/share/lxc/config/ubuntu.userns.conf
lxc.arch = x86_64
lxc.id_map = u 0 123456 65536
lxc.id_map = g 0 123456 65536
lxc.rootfs = /mypath/
lxc.utsname = mycontainer
# Network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = mylink
lxc.network.name = eth0
lxc.network.hwaddr = my:ma:ca:dd:re:ss
A really basic config file
I installed a program inside this container which claims it fails
when calling function pthread_setschedparam. This function should be
permitted when CAP_SYS_NICE capability is not dropped (and this
seems to be the case). I also had same problem in the past when
trying to let a guest change system clock (that time I removed
sys_time from dropped capabilities).
My questions are: are capabilities taken in consideration when
dealing with unprivileged containers? Do I have something more to do
so that I can use this functions inside an unprivileged container?
Best Regards
Michele
Capabilities are targeted to a user namespace. If a modifying a
resource can adversely affect the host, then you'll need the
related capability targeted at the initial user namespace, rather
than your own. (In the kernel source this is the difference between
capable(CAP_SYS_NICE) and ns_capable(ns, CAP_SYS_NICE), where
capable(x) expands to ns_capable(&init_user_ns, x).

So the feature you're trying to set in the container likely requires
the capaability against the initial user ns. Your container cannot
have that.

-serge
Michele Giacomoli
2016-05-18 07:57:11 UTC
Permalink
Thank you Serge

Is there a way for managing user namespace capabilities and add needed
capabilities to initial user namespace?

Best regards
Michele
Post by Serge E. Hallyn
Post by Michele Giacomoli
HI all,
I have an Ubuntu 14.04 host with lxc 1.0.3-0ubuntu3. I created an
unprivileged container with the following capabilities dropped from
lxc.cap.drop = sys_module mac_admin mac_override sys_time
lxc.include = /usr/share/lxc/config/ubuntu.common.conf
lxc.include = /usr/share/lxc/config/ubuntu.userns.conf
lxc.arch = x86_64
lxc.id_map = u 0 123456 65536
lxc.id_map = g 0 123456 65536
lxc.rootfs = /mypath/
lxc.utsname = mycontainer
# Network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = mylink
lxc.network.name = eth0
lxc.network.hwaddr = my:ma:ca:dd:re:ss
A really basic config file
I installed a program inside this container which claims it fails
when calling function pthread_setschedparam. This function should be
permitted when CAP_SYS_NICE capability is not dropped (and this
seems to be the case). I also had same problem in the past when
trying to let a guest change system clock (that time I removed
sys_time from dropped capabilities).
My questions are: are capabilities taken in consideration when
dealing with unprivileged containers? Do I have something more to do
so that I can use this functions inside an unprivileged container?
Best Regards
Michele
Capabilities are targeted to a user namespace. If a modifying a
resource can adversely affect the host, then you'll need the
related capability targeted at the initial user namespace, rather
than your own. (In the kernel source this is the difference between
capable(CAP_SYS_NICE) and ns_capable(ns, CAP_SYS_NICE), where
capable(x) expands to ns_capable(&init_user_ns, x).
So the feature you're trying to set in the container likely requires
the capaability against the initial user ns. Your container cannot
have that.
-serge
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Serge E. Hallyn
2016-05-18 14:49:48 UTC
Permalink
No. You cannot give a non-initial user ns capabilities against
the initial user ns. Kernel simply doesn't support it. You could
leave a channel open for the container to make requests of a daemon
which runs on the host. That's how the cgmanager proxy worked,
talking over a unix socket to the cgmanager on the host.
Post by Michele Giacomoli
Thank you Serge
Is there a way for managing user namespace capabilities and add
needed capabilities to initial user namespace?
Best regards
Michele
Post by Serge E. Hallyn
Post by Michele Giacomoli
HI all,
I have an Ubuntu 14.04 host with lxc 1.0.3-0ubuntu3. I created an
unprivileged container with the following capabilities dropped from
lxc.cap.drop = sys_module mac_admin mac_override sys_time
lxc.include = /usr/share/lxc/config/ubuntu.common.conf
lxc.include = /usr/share/lxc/config/ubuntu.userns.conf
lxc.arch = x86_64
lxc.id_map = u 0 123456 65536
lxc.id_map = g 0 123456 65536
lxc.rootfs = /mypath/
lxc.utsname = mycontainer
# Network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = mylink
lxc.network.name = eth0
lxc.network.hwaddr = my:ma:ca:dd:re:ss
A really basic config file
I installed a program inside this container which claims it fails
when calling function pthread_setschedparam. This function should be
permitted when CAP_SYS_NICE capability is not dropped (and this
seems to be the case). I also had same problem in the past when
trying to let a guest change system clock (that time I removed
sys_time from dropped capabilities).
My questions are: are capabilities taken in consideration when
dealing with unprivileged containers? Do I have something more to do
so that I can use this functions inside an unprivileged container?
Best Regards
Michele
Capabilities are targeted to a user namespace. If a modifying a
resource can adversely affect the host, then you'll need the
related capability targeted at the initial user namespace, rather
than your own. (In the kernel source this is the difference between
capable(CAP_SYS_NICE) and ns_capable(ns, CAP_SYS_NICE), where
capable(x) expands to ns_capable(&init_user_ns, x).
So the feature you're trying to set in the container likely requires
the capaability against the initial user ns. Your container cannot
have that.
-serge
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Michele Giacomoli
2016-05-18 16:10:02 UTC
Permalink
Thank you,
So, as result, there is no way to keep capabilities for unprivileged
containers, and lxc.cap.drop/keep in this case are pretty useless. Am I
right?

Best Regards
Michele
Post by Serge E. Hallyn
No. You cannot give a non-initial user ns capabilities against
the initial user ns. Kernel simply doesn't support it. You could
leave a channel open for the container to make requests of a daemon
which runs on the host. That's how the cgmanager proxy worked,
talking over a unix socket to the cgmanager on the host.
Post by Michele Giacomoli
Thank you Serge
Is there a way for managing user namespace capabilities and add
needed capabilities to initial user namespace?
Best regards
Michele
Post by Serge E. Hallyn
Post by Michele Giacomoli
HI all,
I have an Ubuntu 14.04 host with lxc 1.0.3-0ubuntu3. I created an
unprivileged container with the following capabilities dropped from
lxc.cap.drop = sys_module mac_admin mac_override sys_time
lxc.include = /usr/share/lxc/config/ubuntu.common.conf
lxc.include = /usr/share/lxc/config/ubuntu.userns.conf
lxc.arch = x86_64
lxc.id_map = u 0 123456 65536
lxc.id_map = g 0 123456 65536
lxc.rootfs = /mypath/
lxc.utsname = mycontainer
# Network configuration
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = mylink
lxc.network.name = eth0
lxc.network.hwaddr = my:ma:ca:dd:re:ss
A really basic config file
I installed a program inside this container which claims it fails
when calling function pthread_setschedparam. This function should be
permitted when CAP_SYS_NICE capability is not dropped (and this
seems to be the case). I also had same problem in the past when
trying to let a guest change system clock (that time I removed
sys_time from dropped capabilities).
My questions are: are capabilities taken in consideration when
dealing with unprivileged containers? Do I have something more to do
so that I can use this functions inside an unprivileged container?
Best Regards
Michele
Capabilities are targeted to a user namespace. If a modifying a
resource can adversely affect the host, then you'll need the
related capability targeted at the initial user namespace, rather
than your own. (In the kernel source this is the difference between
capable(CAP_SYS_NICE) and ns_capable(ns, CAP_SYS_NICE), where
capable(x) expands to ns_capable(&init_user_ns, x).
So the feature you're trying to set in the container likely requires
the capaability against the initial user ns. Your container cannot
have that.
-serge
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Serge E. Hallyn
2016-05-19 04:09:42 UTC
Permalink
Post by Michele Giacomoli
Thank you,
So, as result, there is no way to keep capabilities for unprivileged
containers, and lxc.cap.drop/keep in this case are pretty useless.
Am I right?
There's no way to keep capabilities targeted at the host. If for
whatever reason you want to drop capabilities toward the container
itself, you can still use lxc.cap.*, but I don't know of anyone
doing that.

(It could in fact be a way to prevent some of the otherwise increased
kernel surface area)
Michele Giacomoli
2016-05-19 07:24:39 UTC
Permalink
Ok, I got it. Thank you very much for your answer Serge
Post by Serge E. Hallyn
Post by Michele Giacomoli
Thank you,
So, as result, there is no way to keep capabilities for unprivileged
containers, and lxc.cap.drop/keep in this case are pretty useless.
Am I right?
There's no way to keep capabilities targeted at the host. If for
whatever reason you want to drop capabilities toward the container
itself, you can still use lxc.cap.*, but I don't know of anyone
doing that.
(It could in fact be a way to prevent some of the otherwise increased
kernel surface area)
_______________________________________________
lxc-users mailing list
http://lists.linuxcontainers.org/listinfo/lxc-users
Loading...