- 26 Jul, 2022 1 commit
-
-
Lucas Kanashiro authored
This setting is required for transport=knet (which is the default) to work in an unprivileged container.
-
- 06 Mar, 2022 2 commits
-
-
Ferenc Wágner authored
-
Ferenc Wágner authored
Closes: #998785
-
- 02 Feb, 2022 2 commits
-
-
Janitor authored
Changes-By: lintian-brush Fixes: lintian: renamed-tag See-also: https://lintian.debian.org/tags/renamed-tag.html
-
Janitor authored
Changes-By: lintian-brush
-
- 22 Nov, 2021 4 commits
-
-
Ferenc Wágner authored
-
Ferenc Wágner authored
-
Ferenc Wágner authored
Update to upstream version '3.1.6' with Debian dir ea79c2522a0a012997b8e23d73b38c0b5f79be2c
-
Ferenc Wágner authored
-
- 10 Nov, 2021 1 commit
-
-
Jan Friesse authored
Don't rely on implicit symbol finding (cs_strerror being most prominent example) but rather use explicit one. This makes current debian experimental happy (compile source) Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Fabio M. Di Nitto <fdinitto@redhat.com>
-
- 03 Nov, 2021 1 commit
-
-
Jan Friesse authored
Commit 92e0f9c7 added switching of totempg buffers in sync phase. But because buffers got switch too early there was a problem when delivering recovered messages (messages got corrupted and/or lost). Solution is to switch buffers after recovered messages got delivered. I think it is worth to describe complete history with reproducers so it doesn't get lost. It all started with 40263892 (more info about original problem is described in https://bugzilla.redhat.com/show_bug.cgi?id=820821). This patch solves problem which is way to be reproduced with following reproducer: - 2 nodes - Both nodes running corosync and testcpg - Pause node 1 (SIGSTOP of corosync) - On node 1, send some messages by testcpg (it's not answering but this doesn't matter). Simply hit ENTER key few times is enough) - Wait till node 2 detects that node 1 left - Unpause node 1 (SIGCONT of corosync) and on node 1 newly mcasted cpg messages got sent before sync barrier, so node 2 logs "Unknown node -> we will not deliver message". Solution was to add switch of totemsrp new messages buffer. This patch was not enough so new one (92e0f9c7) was created. Reproducer of problem was similar, just cpgverify was used instead of testcpg. Occasionally when node 1 was unpaused it hang in sync phase because there was a partial message in totempg buffers. New sync message had different frag cont so it was thrown away and never delivered. After many years problem was found which is solved by this patch (original issue describe in https://github.com/corosync/corosync/issues/660 ). Reproducer is more complex: - 2 nodes - Node 1 is rate-limited (used script on the hypervisor side): ``` iface=tapXXXX # ~0.1MB/s in bit/s rate=838856 # 1mb/s burst=1048576 tc qdisc add dev $iface root handle 1: htb default 1 tc class add dev $iface parent 1: classid 1:1 htb rate ${rate}bps \ burst ${burst}b tc qdisc add dev $iface handle ffff: ingress tc filter add dev $iface parent ffff: prio 50 basic police rate \ ${rate}bps burst ${burst}b mtu 64kb "drop" ``` - Node 2 is running corosync and cpgverify - Node 1 keeps restarting of corosync and running cpgverify in cycle - Console 1: while true; do corosync; sleep 20; \ kill $(pidof corosync); sleep 20; done - Console 2: while true; do ./cpgverify;done And from time to time (reproduced usually in less than 5 minutes) cpgverify reports corrupted message. Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Fabio M. Di Nitto <fdinitto@redhat.com>
-
- 25 Oct, 2021 1 commit
-
-
Christine Caulfield authored
Signed-off-by:
Christine Caulfield <ccaulfie@redhat.com> Reviewed-by:
Jan Friesse <jfriesse@redhat.com>
-
- 18 Oct, 2021 1 commit
-
-
miharahiro authored
The consensus timeout is 1.2 * token_timeout, which has been changeg from 1000 to 3000, so change also consensus timeout. Signed-off-by:
miharahiro <hmihara@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com> Reviewed-by:
Jan Friesse <jfriesse@redhat.com>
-
- 02 Oct, 2021 1 commit
-
-
Ferenc Wágner authored
-
- 01 Oct, 2021 2 commits
-
-
Ferenc Wágner authored
-
Ferenc Wágner authored
This reverts commit e710e1f5. dh_installsystemd apparently does not find the service files under /usr/lib, and consequently does not insert the maintainer script snippets necessary to start the services on installation (for example).
-
- 30 Sep, 2021 7 commits
-
-
Ferenc Wágner authored
-
Ferenc Wágner authored
-
Ferenc Wágner authored
-
Ferenc Wágner authored
-
Ferenc Wágner authored
-
Ferenc Wágner authored
Update to upstream version '3.1.5' with Debian dir c04c11deae4a91095678c568430f72e60f7b7262
-
Ferenc Wágner authored
-
- 13 Sep, 2021 1 commit
-
-
Jan Friesse authored
Thanks Ryan Cai <ycaibb@gmail.com> for reporting the problem. Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
- 20 Aug, 2021 1 commit
-
-
Jan Friesse authored
Previously, existence of retransmit messages canceled holding of token (and never allowed representative to enter token hold state). This makes token rotating maximum speed and keeps processor resending messages over and over again - overloading network and reducing chance to successfully deliver the messages. Also there were reports of various Antivirus / IPS / IDS which slows down delivery of packets with certain sizes (packets bigger than token) what make Corosync retransmit messages over and over again. Proposed solution is to allow representative to enter token hold state when there are only retransmit messages. This allows network to handle overload and/or gives Antivirus/IPS/IDS enough time scan and deliver packets without corosync entering "FAILED TO RECEIVE" state and adding more load to network. Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
- 04 Aug, 2021 2 commits
-
-
Jan Friesse authored
Knet limits maximum node id to 16-bit type. This was not ensured in corosync and it was possible to set nodeid to value >= 65536 and (surprisingly) most of the things were working quite well because of overflow. corosync-cmapctl -m stats contained knet nodeid in stats.knet. subtree, so for nodeid 65536 result was: Can't get value of stats.knet.node0.link0.connected. Error CS_ERR_NOT_EXIST Commit implements checking of nodeid and limits it to KNET_MAX_HOST value when knet is used. Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
Jan Friesse authored
Nodeid is required for knet for every node. Right now, existence of nodeid is checked only for local for local node, so broaden the test. Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
- 02 Aug, 2021 6 commits
-
-
Jan Friesse authored
Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
Jan Friesse authored
Show 'n' also for first localhost link, so all localhost links are marked consistently with non-brief display. Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
Jan Friesse authored
Needed for having correct index of localhost Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
Jan Friesse authored
Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
Jan Friesse authored
Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
Jan Friesse authored
totem.nodeid is relict from times when nodelist was not required and totemsrp was sending whole membership with ip addresses. With Corosync 3 ip addresses are no longer sent so it is not possible to find "next" node ip address where to send token (because only nodeid is sent) without having information about all of the nodes stored locally. When totem.nodeid was configured it was partly used and other parts (most notably totemudpu_token_target_set) were using autogenerated nodeid. Together it was not possible to create even single node membership. Solution is to ignore totem.nodeid completely (and display warning when it is set). Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
- 29 Jul, 2021 1 commit
-
-
Christine Caulfield authored
Currently if there is a gap in the links (eg link0 is missing) corosync-cfgtool -s will still display the links as 0,1,2,3... even if they are 1,2,5,6... Also display the KNET transport type with the link in corosync-cfgtool -s & -n Signed-off-by:
Christine Caulfield <ccaulfie@redhat.com> Reviewed-by:
Jan Friesse <jfriesse@redhat.com>
-
- 23 Jul, 2021 1 commit
-
-
Jan Friesse authored
Support for cgroup v2 is very similar to cgroup v1 just checking (and writing) different file. Because of all the problems described later with cgroup v2 new "auto" mode (new default) is added. This mode first tries to set rr scheduling and moves Corosync to root cgroup only if it fails. Testing this feature is a bit harder than with cgroup v1 so it's probably worh noting in this commit message. 1. Copy some service file (I've used httpd service) and set CPUQuota=30% in the [service] section. 2. Check /sys/fs/cgroup/cgroup.subtree_control - there should be no "cpu" 3. Start modified service 4. Check /sys/fs/cgroup/cgroup.subtree_control - there should be "cpu" 5. Start corosync - It should be able to get rt priority When move_to_root_cgroup is disabled (applies only for kernels with CONFIG_RT_GROUP_SCHED enabled), behavior differs: - If corosync is started before modified service, so there is no "cpu" in /sys/fs/cgroup/cgroup.subtree_control corosync starts without problem and gets rt priority. Starting modified service later will never add "cpu" into /sys/fs/cgroup/cgroup.subtree_control (because corosync is holding rt priority and it is placed in the non-root cgroup by systemd). - When corosync is started after modified service, so "cpu" is in /sys/fs/cgroup/cgroup.subtree_control, corosync is not able to get RT priority. It's worth noting problems when cgroup v2 is used together with systemd logging described in corosync.conf(5) man page. Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
- 05 Jul, 2021 2 commits
-
-
Ferenc Wágner authored
-
Ferenc Wágner authored
Cherry-picked from v3.1.4. Thanks: Christine Caulfield
-
- 03 Jun, 2021 1 commit
-
-
Christine Caulfield authored
The libqb map API leaves 'ownership' of the data with the caller but does its own lifetime management, so it can easily happen that map_rm() is called and the data deleted by the caller. But if an iterator is running over that item then the map entry will not get removed (leaving dangling pointers) until later. libqb has a hack-y callback that tells the owner when it is safe to delete the allocated memory, so we hook into that. icmap is already using this. Signed-off-by:
Christine Caulfield <ccaulfie@redhat.com> Reviewed-by:
Jan Friesse <jfriesse@redhat.com>
-
- 02 Jun, 2021 1 commit
-
-
Jan Friesse authored
Internally knet is using just one link for localhost so for single node configuration knet_link_get_link_list returns only one entry. This is propagated to `corosync-cfgtool -s`. Signed-off-by:
Jan Friesse <jfriesse@redhat.com> Reviewed-by:
Christine Caulfield <ccaulfie@redhat.com>
-
- 21 May, 2021 1 commit
-
-
Jan Friesse authored
This reverts commit 57e6b86b . We are in process of finding better solution so reverting for now. Signed-off-by:
Jan Friesse <jfriesse@redhat.com>
-