Discussion:
lxc scalability problem
(too old to reply)
yaozhicheng
2013-06-06 08:44:37 UTC
Permalink
Hi all

My OS get crashed when I start more then 20 containers.



OS??

Cent OS 6.4



Lxc version??0.8.0



uname ?Ca

Linux lxc 2.6.32-358.6.2.el6.x86_64 #1 SMP Thu May 16 20:59:36 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux





Part of vmcore-dmesg.txt??



<6>br0: port 73(vethu6B7Bf) entering forwarding state

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<6>br0: port 64(vethT6IQft) entering forwarding state

<7>SELinux: initialized (dev devpts, type devpts), uses transition SIDs

<6>br0: port 65(vethknYQMF) entering forwarding state

<6>br0: port 66(vethm89XzF) entering forwarding state

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<1>BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

<1>IP: [<ffffffff81448834>] process_backlog+0x74/0x100

<4>PGD 0

<4>Oops: 0002 [#1] SMP

<4>last sysfs file: /sys/devices/virtual/net/vethu6B7Bf/flags

<4>CPU 70

<4>Modules linked in: veth autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
freq_table mperf bridge stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 microcode mlx4_core
igb dca ptp pps_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg
i7core_edac edac_core ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif
pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log
dm_mod [last unloaded: scsi_wait_scan]

<4>

<4>Pid: 0, comm: swapper Not tainted 2.6.32-358.6.2.el6.x86_64 #1 Supermicro
X8OBN/X8OBN

<7>SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs

<4>

<4>RIP: 0010:[<ffffffff81448834>] [<ffffffff81448834>]
process_backlog+0x74/0x100

<4>RSP: 0018:ffff8800282c3e20 EFLAGS: 00010002

<4>RAX: ffff8800282d3ef0 RBX: ffff8800282d3f08 RCX: 0000000000000000

<4>RDX: 0000000000000000 RSI: ffff88401e1e8c80 RDI: ffff88401e1e8d80

<4>RBP: ffff8800282c3e60 R08: 0000000000000000 R09: ffff8840259ed038

<4>R10: ffff8a0022126e20 R11: ffff894022d45200 R12: ffff8800282d3e80

<4>R13: 0000000000000014 R14: ffff8800282d3ef0 R15: ffff8800282d3f04

<4>FS: 0000000000000000(0000) GS:ffff8800282c0000(0000)
knlGS:0000000000000000

<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b

<4>CR2: 0000000000000008 CR3: 0000000001a85000 CR4: 00000000000007e0



Container config??

lxc.tty = 4

lxc.pts = 1024

lxc.cgroup.devices.deny = a

lxc.network.type = veth

lxc.network.ipv4 = 172.16.40.153/16

lxc.network.flags = up

lxc.network.link = br0



# /dev/null and zero

lxc.cgroup.devices.allow = c 1:3 rwm

lxc.cgroup.devices.allow = c 1:5 rwm

# consoles

lxc.cgroup.devices.allow = c 5:1 rwm

lxc.cgroup.devices.allow = c 5:0 rwm

lxc.cgroup.devices.allow = c 4:0 rwm

lxc.cgroup.devices.allow = c 4:1 rwm

# /dev/{,u}random

lxc.cgroup.devices.allow = c 1:9 rwm

lxc.cgroup.devices.allow = c 1:8 rwm

lxc.cgroup.devices.allow = c 136:* rwm

lxc.cgroup.devices.allow = c 5:2 rwm

# rtc

lxc.cgroup.devices.allow = c 254:0 rwm



# mounts point

lxc.mount.entry=proc /usr/local/var/lib/lxc/lxc-tpl-server/rootfs/proc proc
nodev,noexec,nosuid 0 0

lxc.mount.entry=sysfs /usr/local/var/lib/lxc/lxc-tpl-server/rootfs/sys sysfs
defaults 0 0

lxc.utsname = idleserver00

lxc.rootfs = /usr/local/var/lib/lxc/idleserver00/rootfs



lxc.cgroup.blkio.throttle.read_bps_device = 8:0 104857600

lxc.cgroup.blkio.throttle.write_bps_device = 8:0 104857600



I searched but cant find why or how to fix it



Thanks,



yaozhicheng at emails.bjut.edu.cn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20130606/f1515848/attachment.html>
Serge Hallyn
2013-06-06 20:15:31 UTC
Permalink
Post by yaozhicheng
Hi all
My OS get crashed when I start more then 20 containers.
Seems like an selinux labeling related problem. If you boot without
selinux enabled do you still have this problem? (I'm not suggesting
that as a workaround - only to verify that the problem is with
selinux using too much memory for the dev tmpfs labeling, so ppl
know where to look)

-serge
lhffjzh
2013-06-07 01:16:38 UTC
Permalink
Hi Serge,

Do you know how about following errors? seems create cgroup failed, but I
checked cgroup dir is ok, I list OS/configuration message below, please help
me.

[root at Dev_2_A ProjectCode]# cat Cmd/LXC_Cmd/LXC_out.log
lxc-execute: Error creating cgroups
lxc-execute: failed to spawn 'bitlxc'

[root at Dev_2_A ProjectCode]# uname -a
Linux Dev_2_A 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011
x86_64 x86_64 x86_64 GNU/Linux

[root at Dev_2_A ProjectCode]# ls /cgroup/
blkio cpu cpuacct cpuset devices freezer memory net_cls
[root at Dev_2_A ProjectCode]# /etc/init.d/cgconfig status
Running
[root at Dev_2_A ProjectCode]#

[root at Dev_2_A ProjectCode]# cat Cmd/LXC_Cmd/LXC_cfg.in
lxc.utsname = bitlxc
lxc.pts = 1024

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = bitbr2
lxc.network.name = eth2
lxc.network.mtu = 9000

# Traffic pass through FortiGate by route
lxc.network.ipv4 = 10.16.1.2/24
lxc.network.ipv4.gateway = 10.16.1.1




Thanks and Regards,
Haifeng
lhffjzh
2013-06-07 01:26:47 UTC
Permalink
Hi Serge,

Here is the kernel message/complain when I run "lxc-execute", FYI.

tail -f /var/log/messages

Jun 7 09:23:42 Dev_2_A kernel: device vethKTg9Vz entered promiscuous mode
Jun 7 09:23:42 Dev_2_A kernel: ADDRCONF(NETDEV_UP): vethKTg9Vz: link is not
ready
Jun 7 09:23:42 Dev_2_A kernel: lo: Disabled Privacy Extensions
Jun 7 09:23:42 Dev_2_A kernel: sit0: Disabled Privacy Extensions
Jun 7 09:23:42 Dev_2_A kernel: bitbr2: port 2(vethKTg9Vz) entering disabled
state
Jun 7 09:23:42 Dev_2_A kernel: bitbr2: port 2(vethKTg9Vz) entering disabled
state


Thanks and Regards,
Haifeng

-----Original Message-----
From: lhffjzh [mailto:lhffjzh at 126.com]
Sent: Friday, June 07, 2013 9:17 AM
To: 'Serge Hallyn'
Cc: 'lxc-users at lists.sourceforge.net'
Subject: Error creating cgroups

Hi Serge,

Do you know how about following errors? seems create cgroup failed, but I
checked cgroup dir is ok, I list OS/configuration message below, please help
me.

[root at Dev_2_A ProjectCode]# cat Cmd/LXC_Cmd/LXC_out.log
lxc-execute: Error creating cgroups
lxc-execute: failed to spawn 'bitlxc'

[root at Dev_2_A ProjectCode]# uname -a
Linux Dev_2_A 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011
x86_64 x86_64 x86_64 GNU/Linux

[root at Dev_2_A ProjectCode]# ls /cgroup/
blkio cpu cpuacct cpuset devices freezer memory net_cls
[root at Dev_2_A ProjectCode]# /etc/init.d/cgconfig status
Running
[root at Dev_2_A ProjectCode]#

[root at Dev_2_A ProjectCode]# cat Cmd/LXC_Cmd/LXC_cfg.in
lxc.utsname = bitlxc
lxc.pts = 1024

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = bitbr2
lxc.network.name = eth2
lxc.network.mtu = 9000

# Traffic pass through FortiGate by route
lxc.network.ipv4 = 10.16.1.2/24
lxc.network.ipv4.gateway = 10.16.1.1




Thanks and Regards,
Haifeng
lhffjzh
2013-06-07 01:53:44 UTC
Permalink
Hi Serge,

I found the reason just now, the Linux build on a Virtual Machine by a
colleague, so it was failed. I think it is ok on a physical OS.



Thanks and Regards,
Haifeng

-----Original Message-----
From: lxc-users-bounces at lists.sourceforge.net
[mailto:lxc-users-bounces at lists.sourceforge.net] On Behalf Of lhffjzh
Sent: Friday, June 07, 2013 9:27 AM
To: 'Serge Hallyn'
Cc: lxc-users at lists.sourceforge.net
Subject: Re: [Lxc-users] Error creating cgroups

Hi Serge,

Here is the kernel message/complain when I run "lxc-execute", FYI.

tail -f /var/log/messages

Jun 7 09:23:42 Dev_2_A kernel: device vethKTg9Vz entered promiscuous mode
Jun 7 09:23:42 Dev_2_A kernel: ADDRCONF(NETDEV_UP): vethKTg9Vz: link is not
ready
Jun 7 09:23:42 Dev_2_A kernel: lo: Disabled Privacy Extensions
Jun 7 09:23:42 Dev_2_A kernel: sit0: Disabled Privacy Extensions
Jun 7 09:23:42 Dev_2_A kernel: bitbr2: port 2(vethKTg9Vz) entering disabled
state
Jun 7 09:23:42 Dev_2_A kernel: bitbr2: port 2(vethKTg9Vz) entering disabled
state


Thanks and Regards,
Haifeng

-----Original Message-----
From: lhffjzh [mailto:lhffjzh at 126.com]
Sent: Friday, June 07, 2013 9:17 AM
To: 'Serge Hallyn'
Cc: 'lxc-users at lists.sourceforge.net'
Subject: Error creating cgroups

Hi Serge,

Do you know how about following errors? seems create cgroup failed, but I
checked cgroup dir is ok, I list OS/configuration message below, please help
me.

[root at Dev_2_A ProjectCode]# cat Cmd/LXC_Cmd/LXC_out.log
lxc-execute: Error creating cgroups
lxc-execute: failed to spawn 'bitlxc'

[root at Dev_2_A ProjectCode]# uname -a
Linux Dev_2_A 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011
x86_64 x86_64 x86_64 GNU/Linux

[root at Dev_2_A ProjectCode]# ls /cgroup/
blkio cpu cpuacct cpuset devices freezer memory net_cls
[root at Dev_2_A ProjectCode]# /etc/init.d/cgconfig status
Running
[root at Dev_2_A ProjectCode]#

[root at Dev_2_A ProjectCode]# cat Cmd/LXC_Cmd/LXC_cfg.in
lxc.utsname = bitlxc
lxc.pts = 1024

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = bitbr2
lxc.network.name = eth2
lxc.network.mtu = 9000

# Traffic pass through FortiGate by route
lxc.network.ipv4 = 10.16.1.2/24
lxc.network.ipv4.gateway = 10.16.1.1




Thanks and Regards,
Haifeng




----------------------------------------------------------------------------
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
yaozhicheng
2013-06-13 01:07:45 UTC
Permalink
-----????-----
???: yaozhicheng [mailto:yaozhicheng at emails.bjut.edu.cn]
????: 2013?6?8? 23:32
???: 'Serge Hallyn'
??: ??: [Lxc-users] lxc scalability problem

Dear serge,
Sorry for my delayed reply :)
Problem Still persists with selinux disabled. Maybe it is not the memory
problem because that my machine has 2TB memory and 8 x E7-8830 cpus.
The OS in the containers is debian6.0. I have run filebench and lmbench in 8
containers simultaneously, they performed very well.
100 more containers can be started when physical eth2 removed out from br0.
May be the veth network problem?

This is part of the vmcore-dmesg.txt(selinux disabled):

<6>br0: port 5(vethOaLJ1U) entering forwarding state
<6>br0: port 4(vethZZ0xuP) entering forwarding state
<6>br0: port 8(vethjAE1Pb) entering forwarding state
<6>br0: port 9(veth0SqVmc) entering forwarding state
<6>br0: port 16(vethtVuHLb) entering forwarding state
<6>br0: port 7(vethtTiLzY) entering forwarding state
<6>br0: port 15(vethxWoVMd) entering forwarding state
<6>br0: port 12(vethvSfzsf) entering forwarding state
<6>br0: port 13(vethw78ho7) entering forwarding state
<6>br0: port 11(vethrf5HMe) entering forwarding state
<6>br0: port 6(veth4Vm0m0) entering forwarding state
<6>br0: port 10(vethho7Oae) entering forwarding state
<6>br0: port 20(vethIsYsGj) entering forwarding state
<6>br0: port 14(vethXaQOHa) entering forwarding state
<6>br0: port 17(veth8rD1je) entering forwarding state <4>------------[ cut
here ]------------ <2>kernel BUG at mm/slab.c:3069!
<4>invalid opcode: 0000 [#1] SMP
<4>last sysfs file: /sys/devices/virtual/net/vetha5mnvF/flags
<4>CPU 70
<4>Modules linked in: veth autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
freq_table mperf bridge stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 microcode mlx4_core
igb dca ptp pps_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg
i7core_edac edac_core ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif
pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log
dm_mod [last unloaded: scsi_wait_scan] <4>
<4>Pid: 0, comm: swapper Not tainted 2.6.32-358.6.2.el6.x86_64 #1 Supermicro
X8OBN/X8OBN
<4>RIP: 0010:[<ffffffff81167354>] [<ffffffff81167354>]
cache_alloc_refill+0x1e4/0x240
<4>RSP: 0018:ffff8800282c3ad0 EFLAGS: 00010096
<4>RAX: 000000000000003c RBX: ffff8940274f0140 RCX: 00000000ffffffcc
<4>RDX: 000000000000003c RSI: 0000000000000000 RDI: ffff884026fa7800
<4>RBP: ffff8800282c3b30 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000000 R11: 0000000000000000 R12: ffff884026fa7800
<4>R13: ffff884026fa0440 R14: 000000000000003c R15: ffff88402455c000
<4>FS: 0000000000000000(0000) GS:ffff8800282c0000(0000)
knlGS:0000000000000000
<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 00007f5599065000 CR3: 0000000001a85000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process swapper (pid: 0, threadinfo ffff88c027c06000, task
ffff8880283d1500)
<4>Stack:
<4> ffff8800282c3b10 000000048144caec ffff884026fa0480 000412201096c080
<4><d> ffff884026fa0460 ffff884026fa0450 ffff8940259fa6e8 0000000000000000
<4><d> 0000000000000020 ffff8940274f0140 0000000000000020 0000000000000246
<4>Call Trace:
<4> <IRQ>
<4> [<ffffffff8116840f>] kmem_cache_alloc+0x15f/0x190 <4>
[<ffffffffa02726e0>] ? __br_forward+0x0/0xd0 [bridge] <4>
[<ffffffff8143f15f>] skb_clone+0x6f/0xb0 <4> [<ffffffffa02726e0>] ?
__br_forward+0x0/0xd0 [bridge] <4> [<ffffffffa0272320>]
deliver_clone+0x30/0x60 [bridge] <4> [<ffffffffa0272549>] br_flood+0x79/0xd0
[bridge] <4> [<ffffffffa02725b5>] br_flood_forward+0x15/0x20 [bridge] <4>
[<ffffffffa02736ee>] br_handle_frame_finish+0x27e/0x2a0 [bridge] <4>
[<ffffffffa02738ba>] br_handle_frame+0x1aa/0x250 [bridge] <4>
[<ffffffff81448599>] __netif_receive_skb+0x529/0x750 <4>
[<ffffffff8143da41>] ? __alloc_skb+0x81/0x190 <4> [<ffffffff8144a8f8>]
netif_receive_skb+0x58/0x60 <4> [<ffffffff8143da41>] ?
__alloc_skb+0x81/0x190 <4> [<ffffffff8144a8f8>] netif_receive_skb+0x58/0x60
<4> [<ffffffff8144aa00>] napi_skb_finish+0x50/0x70 <4> [<ffffffff8144cfa9>]
napi_gro_receive+0x39/0x50 <4> [<ffffffffa015045c>] igb_poll+0x7ec/0xc70
[igb] <4> [<ffffffff81033ef7>] ? native_apic_msr_write+0x37/0x40 <4>
[<ffffffff8144851b>] ? __netif_receive_skb+0x4ab/0x750 <4>
[<ffffffff810a7b05>] ? tick_dev_program_event+0x65/0xc0 <4>
[<ffffffff81012bb9>] ? read_tsc+0x9/0x20 <4> [<ffffffff8144d0c3>]
net_rx_action+0x103/0x2f0 <4> [<ffffffff81076fb1>] __do_softirq+0xc1/0x1e0
<4> [<ffffffff810e1720>] ? handle_IRQ_event+0x60/0x170 <4>
[<ffffffff8100c1cc>] call_softirq+0x1c/0x30 <4> [<ffffffff8100de05>]
do_softirq+0x65/0xa0 <4> [<ffffffff81076d95>] irq_exit+0x85/0x90 <4>
[<ffffffff81517145>] do_IRQ+0x75/0xf0 <4> [<ffffffff8100b9d3>]
ret_from_intr+0x0/0x11 <4> <EOI> <4> [<ffffffff812d39ae>] ?
intel_idle+0xde/0x170 <4> [<ffffffff812d3991>] ? intel_idle+0xc1/0x170 <4>
[<ffffffff81415277>] cpuidle_idle_call+0xa7/0x140 <4> [<ffffffff81009fc6>]
cpu_idle+0xb6/0x110 <4> [<ffffffff81506fcc>] start_secondary+0x2ac/0x2ef
<4>Code: 89 ff e8 70 1c 12 00 eb 99 66 0f 1f 44 00 00 41 c7 45 60 01 00 00
00 4d 8b 7d 20 4c 39 7d c0 0f 85 f2 fe ff ff eb 84 0f 0b eb fe <0f> 0b 66 2e
0f 1f 84 00 00 00 00 00 eb f4 8b 55 ac 8b 75 bc 31 <1>RIP
[<ffffffff81167354>] cache_alloc_refill+0x1e4/0x240 <4> RSP
<ffff8800282c3ad0>

Sorry for my poor English :)

Sincere
-yao

-----????-----
???: Serge Hallyn [mailto:serge.hallyn at ubuntu.com]
????: 2013?6?7? 4:16
???: yaozhicheng
??: lxc-users at lists.sourceforge.net
??: Re: [Lxc-users] lxc scalability problem
Post by yaozhicheng
Hi all
My OS get crashed when I start more then 20 containers.
Seems like an selinux labeling related problem. If you boot without selinux
enabled do you still have this problem? (I'm not suggesting that as a
workaround - only to verify that the problem is with selinux using too much
memory for the dev tmpfs labeling, so ppl know where to look)

-serge
Serge Hallyn
2013-06-13 12:43:31 UTC
Permalink
Uh, yeah, wow. This looks like a bug in the debian kernel.

I'd suggest trying to reproduce this with a small shellscript
so you can report it.

-serge
Post by yaozhicheng
-----????-----
???: yaozhicheng [mailto:yaozhicheng at emails.bjut.edu.cn]
????: 2013?6?8? 23:32
???: 'Serge Hallyn'
??: ??: [Lxc-users] lxc scalability problem
Dear serge,
Sorry for my delayed reply :)
Problem Still persists with selinux disabled. Maybe it is not the memory
problem because that my machine has 2TB memory and 8 x E7-8830 cpus.
The OS in the containers is debian6.0. I have run filebench and lmbench in 8
containers simultaneously, they performed very well.
100 more containers can be started when physical eth2 removed out from br0.
May be the veth network problem?
<6>br0: port 5(vethOaLJ1U) entering forwarding state
<6>br0: port 4(vethZZ0xuP) entering forwarding state
<6>br0: port 8(vethjAE1Pb) entering forwarding state
<6>br0: port 9(veth0SqVmc) entering forwarding state
<6>br0: port 16(vethtVuHLb) entering forwarding state
<6>br0: port 7(vethtTiLzY) entering forwarding state
<6>br0: port 15(vethxWoVMd) entering forwarding state
<6>br0: port 12(vethvSfzsf) entering forwarding state
<6>br0: port 13(vethw78ho7) entering forwarding state
<6>br0: port 11(vethrf5HMe) entering forwarding state
<6>br0: port 6(veth4Vm0m0) entering forwarding state
<6>br0: port 10(vethho7Oae) entering forwarding state
<6>br0: port 20(vethIsYsGj) entering forwarding state
<6>br0: port 14(vethXaQOHa) entering forwarding state
<6>br0: port 17(veth8rD1je) entering forwarding state <4>------------[ cut
here ]------------ <2>kernel BUG at mm/slab.c:3069!
<4>invalid opcode: 0000 [#1] SMP
<4>last sysfs file: /sys/devices/virtual/net/vetha5mnvF/flags
<4>CPU 70
<4>Modules linked in: veth autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
freq_table mperf bridge stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 microcode mlx4_core
igb dca ptp pps_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg
i7core_edac edac_core ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif
pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log
dm_mod [last unloaded: scsi_wait_scan] <4>
<4>Pid: 0, comm: swapper Not tainted 2.6.32-358.6.2.el6.x86_64 #1 Supermicro
X8OBN/X8OBN
<4>RIP: 0010:[<ffffffff81167354>] [<ffffffff81167354>]
cache_alloc_refill+0x1e4/0x240
<4>RSP: 0018:ffff8800282c3ad0 EFLAGS: 00010096
<4>RAX: 000000000000003c RBX: ffff8940274f0140 RCX: 00000000ffffffcc
<4>RDX: 000000000000003c RSI: 0000000000000000 RDI: ffff884026fa7800
<4>RBP: ffff8800282c3b30 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000000 R11: 0000000000000000 R12: ffff884026fa7800
<4>R13: ffff884026fa0440 R14: 000000000000003c R15: ffff88402455c000
<4>FS: 0000000000000000(0000) GS:ffff8800282c0000(0000)
knlGS:0000000000000000
<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 00007f5599065000 CR3: 0000000001a85000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process swapper (pid: 0, threadinfo ffff88c027c06000, task
ffff8880283d1500)
<4> ffff8800282c3b10 000000048144caec ffff884026fa0480 000412201096c080
<4><d> ffff884026fa0460 ffff884026fa0450 ffff8940259fa6e8 0000000000000000
<4><d> 0000000000000020 ffff8940274f0140 0000000000000020 0000000000000246
<4> <IRQ>
<4> [<ffffffff8116840f>] kmem_cache_alloc+0x15f/0x190 <4>
[<ffffffffa02726e0>] ? __br_forward+0x0/0xd0 [bridge] <4>
[<ffffffff8143f15f>] skb_clone+0x6f/0xb0 <4> [<ffffffffa02726e0>] ?
__br_forward+0x0/0xd0 [bridge] <4> [<ffffffffa0272320>]
deliver_clone+0x30/0x60 [bridge] <4> [<ffffffffa0272549>] br_flood+0x79/0xd0
[bridge] <4> [<ffffffffa02725b5>] br_flood_forward+0x15/0x20 [bridge] <4>
[<ffffffffa02736ee>] br_handle_frame_finish+0x27e/0x2a0 [bridge] <4>
[<ffffffffa02738ba>] br_handle_frame+0x1aa/0x250 [bridge] <4>
[<ffffffff81448599>] __netif_receive_skb+0x529/0x750 <4>
[<ffffffff8143da41>] ? __alloc_skb+0x81/0x190 <4> [<ffffffff8144a8f8>]
netif_receive_skb+0x58/0x60 <4> [<ffffffff8143da41>] ?
__alloc_skb+0x81/0x190 <4> [<ffffffff8144a8f8>] netif_receive_skb+0x58/0x60
<4> [<ffffffff8144aa00>] napi_skb_finish+0x50/0x70 <4> [<ffffffff8144cfa9>]
napi_gro_receive+0x39/0x50 <4> [<ffffffffa015045c>] igb_poll+0x7ec/0xc70
[igb] <4> [<ffffffff81033ef7>] ? native_apic_msr_write+0x37/0x40 <4>
[<ffffffff8144851b>] ? __netif_receive_skb+0x4ab/0x750 <4>
[<ffffffff810a7b05>] ? tick_dev_program_event+0x65/0xc0 <4>
[<ffffffff81012bb9>] ? read_tsc+0x9/0x20 <4> [<ffffffff8144d0c3>]
net_rx_action+0x103/0x2f0 <4> [<ffffffff81076fb1>] __do_softirq+0xc1/0x1e0
<4> [<ffffffff810e1720>] ? handle_IRQ_event+0x60/0x170 <4>
[<ffffffff8100c1cc>] call_softirq+0x1c/0x30 <4> [<ffffffff8100de05>]
do_softirq+0x65/0xa0 <4> [<ffffffff81076d95>] irq_exit+0x85/0x90 <4>
[<ffffffff81517145>] do_IRQ+0x75/0xf0 <4> [<ffffffff8100b9d3>]
ret_from_intr+0x0/0x11 <4> <EOI> <4> [<ffffffff812d39ae>] ?
intel_idle+0xde/0x170 <4> [<ffffffff812d3991>] ? intel_idle+0xc1/0x170 <4>
[<ffffffff81415277>] cpuidle_idle_call+0xa7/0x140 <4> [<ffffffff81009fc6>]
cpu_idle+0xb6/0x110 <4> [<ffffffff81506fcc>] start_secondary+0x2ac/0x2ef
<4>Code: 89 ff e8 70 1c 12 00 eb 99 66 0f 1f 44 00 00 41 c7 45 60 01 00 00
00 4d 8b 7d 20 4c 39 7d c0 0f 85 f2 fe ff ff eb 84 0f 0b eb fe <0f> 0b 66 2e
0f 1f 84 00 00 00 00 00 eb f4 8b 55 ac 8b 75 bc 31 <1>RIP
[<ffffffff81167354>] cache_alloc_refill+0x1e4/0x240 <4> RSP
<ffff8800282c3ad0>
Sorry for my poor English :)
Sincere
-yao
-----????-----
???: Serge Hallyn [mailto:serge.hallyn at ubuntu.com]
????: 2013?6?7? 4:16
???: yaozhicheng
??: lxc-users at lists.sourceforge.net
??: Re: [Lxc-users] lxc scalability problem
Post by yaozhicheng
Hi all
My OS get crashed when I start more then 20 containers.
Seems like an selinux labeling related problem. If you boot without selinux
enabled do you still have this problem? (I'm not suggesting that as a
workaround - only to verify that the problem is with selinux using too much
memory for the dev tmpfs labeling, so ppl know where to look)
-serge
------------------------------------------------------------------------------
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Lxc-users mailing list
Lxc-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users
yaozhicheng
2013-06-14 03:18:10 UTC
Permalink
Yeah, may be the kernel bug
Problem have been solved with updated kernel 2.6.32-358.11.1.el6.x86_64 and latest network card driver igb 4.2.16

Thanks and Regards
-yao
-----????-----
???: Serge Hallyn [mailto:serge.hallyn at ubuntu.com]
????: 2013?6?13? 20:44
???: yaozhicheng
??: lxc-users at lists.sourceforge.net
??: Re: [Lxc-users] ??: lxc scalability problem

Uh, yeah, wow. This looks like a bug in the debian kernel.

I'd suggest trying to reproduce this with a small shellscript so you can report it.

-serge
Post by yaozhicheng
-----????-----
???: yaozhicheng [mailto:yaozhicheng at emails.bjut.edu.cn]
????: 2013?6?8? 23:32
???: 'Serge Hallyn'
??: ??: [Lxc-users] lxc scalability problem
Dear serge,
Sorry for my delayed reply :)
Problem Still persists with selinux disabled. Maybe it is not the
memory problem because that my machine has 2TB memory and 8 x E7-8830 cpus.
The OS in the containers is debian6.0. I have run filebench and
lmbench in 8 containers simultaneously, they performed very well.
100 more containers can be started when physical eth2 removed out from br0.
May be the veth network problem?
<6>br0: port 5(vethOaLJ1U) entering forwarding state
<6>br0: port 4(vethZZ0xuP) entering forwarding state
<6>br0: port 8(vethjAE1Pb) entering forwarding state
<6>br0: port 9(veth0SqVmc) entering forwarding state
<6>br0: port 16(vethtVuHLb) entering forwarding state
<6>br0: port 7(vethtTiLzY) entering forwarding state
<6>br0: port 15(vethxWoVMd) entering forwarding state
<6>br0: port 12(vethvSfzsf) entering forwarding state
<6>br0: port 13(vethw78ho7) entering forwarding state
<6>br0: port 11(vethrf5HMe) entering forwarding state
<6>br0: port 6(veth4Vm0m0) entering forwarding state
<6>br0: port 10(vethho7Oae) entering forwarding state
<6>br0: port 20(vethIsYsGj) entering forwarding state
<6>br0: port 14(vethXaQOHa) entering forwarding state
<6>br0: port 17(veth8rD1je) entering forwarding state <4>------------[
cut here ]------------ <2>kernel BUG at mm/slab.c:3069!
<4>invalid opcode: 0000 [#1] SMP
<4>last sysfs file: /sys/devices/virtual/net/vetha5mnvF/flags
<4>CPU 70
<4>Modules linked in: veth autofs4 sunrpc cpufreq_ondemand
acpi_cpufreq freq_table mperf bridge stp llc ipt_REJECT
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables ipv6 microcode mlx4_core igb dca ptp pps_core i2c_i801
i2c_core iTCO_wdt iTCO_vendor_support sg i7core_edac edac_core ext4
mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic
ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan] <4>
<4>Pid: 0, comm: swapper Not tainted 2.6.32-358.6.2.el6.x86_64 #1
Supermicro X8OBN/X8OBN
<4>RIP: 0010:[<ffffffff81167354>] [<ffffffff81167354>]
cache_alloc_refill+0x1e4/0x240
<4>RSP: 0018:ffff8800282c3ad0 EFLAGS: 00010096
<4>RAX: 000000000000003c RBX: ffff8940274f0140 RCX: 00000000ffffffcc
<4>RDX: 000000000000003c RSI: 0000000000000000 RDI: ffff884026fa7800
<4>RBP: ffff8800282c3b30 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000000 R11: 0000000000000000 R12: ffff884026fa7800
<4>R13: ffff884026fa0440 R14: 000000000000003c R15: ffff88402455c000
<4>FS: 0000000000000000(0000) GS:ffff8800282c0000(0000)
knlGS:0000000000000000
<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 00007f5599065000 CR3: 0000000001a85000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process swapper (pid: 0, threadinfo ffff88c027c06000, task
ffff8880283d1500)
<4> ffff8800282c3b10 000000048144caec ffff884026fa0480
000412201096c080 <4><d> ffff884026fa0460 ffff884026fa0450
ffff8940259fa6e8 0000000000000000 <4><d> 0000000000000020
<4> <IRQ>
<4> [<ffffffff8116840f>] kmem_cache_alloc+0x15f/0x190 <4>
[<ffffffffa02726e0>] ? __br_forward+0x0/0xd0 [bridge] <4>
[<ffffffff8143f15f>] skb_clone+0x6f/0xb0 <4> [<ffffffffa02726e0>] ?
__br_forward+0x0/0xd0 [bridge] <4> [<ffffffffa0272320>]
deliver_clone+0x30/0x60 [bridge] <4> [<ffffffffa0272549>]
br_flood+0x79/0xd0 [bridge] <4> [<ffffffffa02725b5>]
br_flood_forward+0x15/0x20 [bridge] <4> [<ffffffffa02736ee>]
br_handle_frame_finish+0x27e/0x2a0 [bridge] <4> [<ffffffffa02738ba>]
br_handle_frame+0x1aa/0x250 [bridge] <4> [<ffffffff81448599>]
__netif_receive_skb+0x529/0x750 <4> [<ffffffff8143da41>] ?
__alloc_skb+0x81/0x190 <4> [<ffffffff8144a8f8>]
netif_receive_skb+0x58/0x60 <4> [<ffffffff8143da41>] ?
__alloc_skb+0x81/0x190 <4> [<ffffffff8144a8f8>]
netif_receive_skb+0x58/0x60 <4> [<ffffffff8144aa00>]
napi_skb_finish+0x50/0x70 <4> [<ffffffff8144cfa9>]
napi_gro_receive+0x39/0x50 <4> [<ffffffffa015045c>]
igb_poll+0x7ec/0xc70 [igb] <4> [<ffffffff81033ef7>] ?
native_apic_msr_write+0x37/0x40 <4> [<ffffffff8144851b>] ?
__netif_receive_skb+0x4ab/0x750 <4> [<ffffffff810a7b05>] ?
tick_dev_program_event+0x65/0xc0 <4> [<ffffffff81012bb9>] ?
read_tsc+0x9/0x20 <4> [<ffffffff8144d0c3>]
net_rx_action+0x103/0x2f0 <4> [<ffffffff81076fb1>]
__do_softirq+0xc1/0x1e0 <4> [<ffffffff810e1720>] ?
handle_IRQ_event+0x60/0x170 <4> [<ffffffff8100c1cc>]
call_softirq+0x1c/0x30 <4> [<ffffffff8100de05>]
do_softirq+0x65/0xa0 <4> [<ffffffff81076d95>] irq_exit+0x85/0x90 <4>
[<ffffffff81517145>] do_IRQ+0x75/0xf0 <4> [<ffffffff8100b9d3>]
ret_from_intr+0x0/0x11 <4> <EOI> <4> [<ffffffff812d39ae>] ?
intel_idle+0xde/0x170 <4> [<ffffffff812d3991>] ? intel_idle+0xc1/0x170
<4> [<ffffffff81415277>] cpuidle_idle_call+0xa7/0x140 <4>
[<ffffffff81009fc6>]
cpu_idle+0xb6/0x110 <4> [<ffffffff81506fcc>]
start_secondary+0x2ac/0x2ef
<4>Code: 89 ff e8 70 1c 12 00 eb 99 66 0f 1f 44 00 00 41 c7 45 60 01
00 00
00 4d 8b 7d 20 4c 39 7d c0 0f 85 f2 fe ff ff eb 84 0f 0b eb fe <0f> 0b
66 2e 0f 1f 84 00 00 00 00 00 eb f4 8b 55 ac 8b 75 bc 31 <1>RIP
[<ffffffff81167354>] cache_alloc_refill+0x1e4/0x240 <4> RSP
<ffff8800282c3ad0>
Sorry for my poor English :)
Sincere
-yao
-----????-----
???: Serge Hallyn [mailto:serge.hallyn at ubuntu.com]
????: 2013?6?7? 4:16
???: yaozhicheng
??: lxc-users at lists.sourceforge.net
??: Re: [Lxc-users] lxc scalability problem
Post by yaozhicheng
Hi all
My OS get crashed when I start more then 20 containers.
Seems like an selinux labeling related problem. If you boot without
selinux enabled do you still have this problem? (I'm not suggesting
that as a workaround - only to verify that the problem is with selinux
using too much memory for the dev tmpfs labeling, so ppl know where to
look)
-serge
----------------------------------------------------------------------
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Lxc-users mailing list
Lxc-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users
Continue reading on narkive:
Loading...