Description of problem: We test upgrade of clusters with Number of Bricks: 2 x 2 = 4 configuration from gluster 5 to gluster 9 (Centos7 OS). Often (in 90% of the cases) we see below problem. After upgrade of the first node we trigger manual heal, waiting when everything will be healed. But if after this we are trying to access files on upgraded node ls -l, then Number of entries in heal pending goes up. Sometimes it happens on both bricks of upgraded node, sometimes only on one of them.

When we a looking at details of affected files on upgraded and not upgraded nodes, we see following:

  1. On upgraded node afr have metadata bit set and trusted.glusterfs.mdata present:
fa05.sg05# getfattr -d -m . -e hex /data1/gluster/20200922-upg-004/many_files/562_4.log; stat /data1/gluster/20200922-upg-004/many_files/562_4.log
getfattr: Removing leading '/' from absolute path names
# file: data1/gluster/20200922-upg-004/many_files/562_4.log
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.shared-client-1=0x000000000000000100000000
trusted.gfid=0x3bf30f76e2c94a67a0878bea1bd8db97
trusted.gfid2path.a9c4d7d45775ec60=0x36343866643935342d653534622d343762302d386631392d6339333636306561303838632f3536325f342e6c6f67
trusted.glusterfs.mdata=0x01000000000000000000000000632c3c3c000000000d6470f900000000632c3c3c000000000d552e6800000000632c3c3c000000000d36a947

metadata of file looks like this:

  File: ‘/data1/gluster/20200922-upg-004/many_files/562_4.log’
  Size: 4096      	Blocks: 8          IO Block: 4096   regular file
Device: 820h/2080d	Inode: 29306144911  Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-09-22 10:43:08.221686087 +0000
Modify: 2022-09-22 10:43:08.223686248 +0000
Change: 2022-09-22 10:44:18.690378905 +0000
 Birth: -
  1. On not upgraded node afr metadata bits are clear, no trusted.glusterfs.mdata present and metadata looks like below:
fa06.sg05# getfattr -d -m . -e hex /data1/gluster/20200922-upg-004/many_files/562_4.log; stat /data1/gluster/20200922-upg-004/many_files/562_4.log
getfattr: Removing leading '/' from absolute path names
# file: data1/gluster/20200922-upg-004/many_files/562_4.log
trusted.afr.dirty=0x000000000000000000000000
trusted.gfid=0x3bf30f76e2c94a67a0878bea1bd8db97
trusted.gfid2path.a9c4d7d45775ec60=0x36343866643935342d653534622d343762302d386631392d6339333636306561303838632f3536325f342e6c6f67
  File: ‘/data1/gluster/20200922-upg-004/many_files/562_4.log’
  Size: 4096      	Blocks: 8          IO Block: 4096   regular file
Device: 810h/2064d	Inode: 37482767283  Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-09-22 10:43:08.225285090 +0000
Modify: 2022-09-22 10:43:08.227285247 +0000
Change: 2022-09-22 10:43:08.227285247 +0000
 Birth: -

We see that Change field differs, and change time on upgraded node corresponds to the time when ls -l operation has been performed:

fa05.sg05# date; ls -l /shared/20200922-upg-004/many_files/ | wc -l; date
Thu Sep 22 10:44:17 UTC 2022
10229
Thu Sep 22 10:44:35 UTC 2022

I'm not sure why attempt to access file on shared directory might lead to some 'change'?

Logs shows following:

fa05.sg05# grep "10:44:1" /var/log/glusterfs/shared.log | more
[2022-09-22 10:44:17.755722 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:843:client4_0_setxattr_cbk] 0-shared-client-1: remote operation failed. [{errno=107}, {error=Transport endpoint is not connected
}] 

and lot of following after this:

[2022-09-22 10:44:17.756221 +0000] W [dict.c:1532:dict_get_with_ref] (-->/usr/lib64/glusterfs/9.6/xlator/cluster/replicate.so(+0x2482a) [0x7f971c2b082a] -->/lib64/libglusterfs.so.0(dict_get_bin+0x3b) [0x
7f972b50b1eb] -->/lib64/libglusterfs.so.0(dict_get_with_ref+0x8e) [0x7f972b50925e] ) 0-dict: dict OR key (dht-get-iatt-in-xattr) is NULL [Invalid argument]

No anything in glfsheal-shared.log, glustershd.log or bricks/data2-gluster.log or on second (not upgraded node). This Transport endpoint is not connected fully unexpected also, because all commands shows OK status. Also there are no such messaged on cluster in other moments. Can it be that different versions of gluster on different nodes can lead to this? We also tried upgrade from 5 to 8, but also observed this issue.

Because of this observation we are getting high I/O load after upgrade of first node, it might take few hours to get all heals done. After this we upgrading second node and everything is fine for this iteration always It is not clear for us why some changes of metadata happen at ls -l (or other attempts to access files on upgraded node). Could you please help with this?

The exact command to reproduce the issue:

  1. Have cluster of 2 nodes with glusterfs 5.13 on both. First node prepared for upgrade and traffic goes through second node.
  2. Upgrade first node to glusterfs 9.6 (based on instruction)
  3. Run manual heal and wait when it will be completed.
  4. Try to access files on shared (ls -l)
  5. Observe problem.

The full output of the command that failed: N/A

Expected results: We expect that metadata should not be changed at attempt to access files (ls -l) and files should not get into heal list.

Mandatory info: - The output of the gluster volume info command:

Volume Name: shared
Type: Distributed-Replicate
Volume ID: 7a20113b-a8f1-4e9f-93b3-80cd26aaa20f
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: fa05.gl:/data1/gluster
Brick2: fa06.gl:/data1/gluster
Brick3: fa05.gl:/data2/gluster
Brick4: fa06.gl:/data2/gluster
Options Reconfigured:
cluster.entry-self-heal: on
cluster.data-self-heal: on
cluster.metadata-self-heal: on
cluster.self-heal-daemon: on
performance.cache-refresh-timeout: 60
performance.cache-size: 2048MB
performance.write-behind-window-size: 128MB
performance.io-thread-count: 32
performance.flush-behind: on
auth.allow: ,,127.*
transport.address-family: inet
nfs.disable: off
performance.client-io-threads: off
storage.fips-mode-rchecksum: off

- The output of the gluster volume status command:

Status of volume: shared
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick fa05.gl:/data1/gluster                49156     0          Y       430  
Brick fa06.gl:/data1/gluster                49152     0          Y       28769
Brick fa05.gl:/data2/gluster                49157     0          Y       450  
Brick fa06.gl:/data2/gluster                49153     0          Y       28775
Self-heal Daemon on localhost               N/A       N/A        Y       467  
NFS Server on fa06.gl                       N/A       N/A        N       N/A  
Self-heal Daemon on fa06.gl                 N/A       N/A        Y       28784
 
Task Status of Volume shared
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:

fa05.sg05# gluster volume heal shared info summary
Brick fa05.gl:/data1/gluster
Status: Connected
Total Number of entries: 5387
Number of entries in heal pending: 5387
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick fa06.gl:/data1/gluster
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick fa05.gl:/data2/gluster
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick fa06.gl:/data2/gluster
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

**- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/

**- Is there any crash ? Provide the backtrace and coredump N/A

Additional info:

- The operating system / glusterfs version: Client Information:

	fa05.sg05# uname -r; cat /etc/issue
	5.15.31
	\S
	Kernel \r on an \m
	
	fa06.sg05# uname -r; cat /etc/issue
	5.15.31
	\S
	Kernel \r on an \m

	fa05.sg05# df -Th
	Filesystem        Type            Size  Used Avail Use% Mounted on
	devtmpfs          devtmpfs        126G     0  126G   0% /dev
	tmpfs             tmpfs           126G  4.0K  126G   1% /dev/shm
	tmpfs             tmpfs           126G  2.7G  124G   3% /run
	tmpfs             tmpfs           126G     0  126G   0% /sys/fs/cgroup
	/dev/sdb3         xfs             542G  5.8G  536G   2% /
	/dev/sdb1         xfs            1014M  192M  823M  19% /boot
	tmpfs             tmpfs            26G     0   26G   0% /run/user/1025
	/dev/sdc          xfs              22T   16T  6.2T  72% /data1
	/dev/sdd          xfs              22T   16T  6.3T  72% /data2
	tmpfs             tmpfs            26G     0   26G   0% /run/user/189
	localhost:/shared fuse.glusterfs   44T   33T   12T  75% /shared

	fa06.sg05# df -Th
	Filesystem        Type            Size  Used Avail Use% Mounted on
	devtmpfs          devtmpfs        126G     0  126G   0% /dev
	tmpfs             tmpfs           126G  4.0K  126G   1% /dev/shm
	tmpfs             tmpfs           126G  2.9G  123G   3% /run
	tmpfs             tmpfs           126G     0  126G   0% /sys/fs/cgroup
	/dev/sda3         xfs             542G  5.4G  536G   1% /
	/dev/sda1         xfs            1014M  192M  823M  19% /boot
	/dev/sdb          xfs              22T   16T  6.8T  69% /data1
	/dev/sde          xfs              22T   17T  5.6T  75% /data2
	tmpfs             tmpfs            26G     0   26G   0% /run/user/1025
	localhost:/shared fuse.glusterfs   44T   33T   12T  75% /shared
	tmpfs             tmpfs            26G     0   26G   0% /run/user/189

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

0

NotUpgradedNode.txt UpgradedNode.txt Outputs of debug commands from both nodes

0
© 2022 pullanswer.com - All rights reserved.