25
submitted 11 months ago* (last edited 11 months ago) by [email protected] to c/[email protected]

In my dmesg logs I get following errors a lot:

[232671.710741] BTRFS warning (device nvme0n1p2): csum failed root 257 ino 2496314 off 946159616 csum 0xb7eb9798 expected csum 0x3803f9f6 mirror 1
[232671.710746] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 19297, gen 0
[232673.984324] BTRFS warning (device nvme0n1p2): csum failed root 257 ino 2496314 off 946159616 csum 0xb7eb9798 expected csum 0x3803f9f6 mirror 1
[232673.984329] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 19298, gen 0
[232673.988851] BTRFS warning (device nvme0n1p2): csum failed root 257 ino 2496314 off 946159616 csum 0xb7eb9798 expected csum 0x3803f9f6 mirror 1

I've run btrfs scrub start -Bd /home as described here. The report afterwards claim everything is fine.

btrfs scrub status /home
UUID:             145c0d63-05f8-43a2-934b-7583cb5f6100
Scrub started:    Fri Aug  4 11:35:19 2023
Status:           finished
Duration:         0:07:49
Total to scrub:   480.21GiB
Rate:             1.02GiB/s
Error summary:    no errors found
top 10 comments
sorted by: hot top controversial new old
[-] [email protected] 7 points 11 months ago

Are you sure you selected the correct mount point? You can also give it the partition directly

[-] [email protected] 2 points 11 months ago

yes I'm sure.

root@archiso /mnt/arch # cat ./etc/fstab 
# Static information about the filesystems.
# See fstab(5) for details.

#      
# /dev/nvme0n1p2
UUID=145c0d63-05f8-43a2-934b-7583cb5f6100	/         	btrfs     	rw,relatime,ssd,discard=async,space_cache=v2,subvolid=256,subvol=/@	0 0

# /dev/nvme0n1p2
UUID=145c0d63-05f8-43a2-934b-7583cb5f6100	/.snapshots	btrfs     	rw,relatime,ssd,discard=async,space_cache=v2,subvolid=260,subvol=/@.snapshots	0 0

# /dev/nvme0n1p1
UUID=4BF3-12AA      	/boot     	vfat      	rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro	0 2

# /dev/nvme0n1p2
UUID=145c0d63-05f8-43a2-934b-7583cb5f6100	/home     	btrfs     	rw,relatime,ssd,discard=async,space_cache=v2,subvolid=257,subvol=/@home	0 0

# /dev/nvme0n1p2
UUID=145c0d63-05f8-43a2-934b-7583cb5f6100	/var/cache/pacman/pkg	btrfs     	rw,relatime,ssd,discard=async,space_cache=v2,subvolid=259,subvol=/@pkg	0 0

# /dev/nvme0n1p2
UUID=145c0d63-05f8-43a2-934b-7583cb5f6100	/var/log  	btrfs     	rw,relatime,ssd,discard=async,space_cache=v2,subvolid=258,subvol=/@log	0 0
[-] [email protected] 5 points 11 months ago

Could you do an offline btrfs-check? (no --repair!)

[-] [email protected] 2 points 11 months ago
root@archiso ~ # lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0   673M  1 loop /run/archiso/airootfs
sda           8:0    0 476.9G  0 disk 
└─sda1        8:1    0 476.9G  0 part 
sdb           8:16   0 119.2G  0 disk 
└─sdb1        8:17   0 119.2G  0 part 
sdc           8:32   1  14.4G  0 disk 
├─sdc1        8:33   1   778M  0 part 
└─sdc2        8:34   1    15M  0 part 
nvme0n1     259:0    0 931.5G  0 disk 
├─nvme0n1p1 259:1    0   511M  0 part 
└─nvme0n1p2 259:2    0   931G  0 part 
root@archiso ~ # btrfs check /dev/nvme0n1p2
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p2
UUID: 145c0d63-05f8-43a2-934b-7583cb5f6100
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 514161029120 bytes used, no error found
total csum bytes: 496182240
total tree bytes: 1464221696
total fs tree bytes: 813809664
total extent tree bytes: 57655296
btree space waste bytes: 248053148
file data blocks allocated: 4385471590400
 referenced 512920408064
btrfs check /dev/nvme0n1p2  4.15s user 1.66s system 62% cpu 9.316 total
[-] [email protected] 3 points 11 months ago

What RAID profile are you using on your filesystem? Checksum failures on RAID1/5/6/10 aren't fatal because the block is read from a different mirror.

[-] [email protected] 2 points 11 months ago

RAID? how can I check? I'm not using RAID as far as I know

[-] [email protected] 2 points 11 months ago

You'd probably know unless you didn't set it up yourself!

btrfs device usage /mountpoint

[-] [email protected] 2 points 11 months ago

Could you show us the raid type and info with btrfs fi us /home, and then the errors encountered on each disk with btrfs dev sta /home?

[-] [email protected] 3 points 11 months ago
root@archiso /mnt/arch # btrfs fi us .
Overall:
    Device size:		 931.01GiB
    Device allocated:		 526.02GiB
    Device unallocated:		 404.99GiB
    Device missing:		     0.00B
    Device slack:		     0.00B
    Used:			 480.21GiB
    Free (estimated):		 447.51GiB	(min: 245.02GiB)
    Free (statfs, df):		 447.51GiB
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,single: Size:520.01GiB, Used:477.49GiB (91.82%)
   /dev/nvme0n1p2	 520.01GiB

Metadata,DUP: Size:3.00GiB, Used:1.36GiB (45.45%)
   /dev/nvme0n1p2	   6.00GiB

System,DUP: Size:8.00MiB, Used:80.00KiB (0.98%)
   /dev/nvme0n1p2	  16.00MiB

Unallocated:
   /dev/nvme0n1p2	 404.99GiB

root@archiso /mnt/arch # btrfs device stats .
[/dev/nvme0n1p2].write_io_errs    0
[/dev/nvme0n1p2].read_io_errs     0
[/dev/nvme0n1p2].flush_io_errs    0
[/dev/nvme0n1p2].corruption_errs  19317
[/dev/nvme0n1p2].generation_errs  0
[-] [email protected] 1 points 11 months ago

Few possibilities here:

Could be something wrong with the SSD - is it a Samsung one by any chance? There was a firmware issue that caused the SSD lifespan to degrade at a higher rate than normal... This article only covers the 980 but I believe there were a few models affected

https://www.tomshardware.com/news/samsung-980-pro-ssd-failures-firmware-update

It also could be that whatever files were corrupted have been deleted (maybe browser cache files etc.) or the allocated block is corrupted but contains no files within it. After running a scrub, the names of files within a corrupted block are shown in dmesg - if there's none then I think you're fine, but strongly consider replacing the SSD/updating its firmware/checking its SMART diagnostic data to see if its ok.

The error counter can be reset with btrfs dev sta --reset to see if these errors pop up again after trying a resolution

load more comments
view more: next ›
this post was submitted on 04 Aug 2023
25 points (90.3% liked)

Linux

45443 readers
967 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS