View previous topic :: View next topic |
Author |
Message |
RayDude Advocate
Joined: 29 May 2004 Posts: 2066 Location: San Jose, CA
|
Posted: Thu May 30, 2019 4:44 pm Post subject: How come EXT4 slows my ssd so much? |
|
|
Code: | server /mnt/backup/root # hdparm -tT /dev/nvme0n1
/dev/nvme0n1:
Timing cached reads: 22260 MB in 2.00 seconds = 11144.34 MB/sec
Timing buffered disk reads: 8146 MB in 3.00 seconds = 2715.05 MB/sec
server /mnt/backup/root # hdparm -tT /dev/nvme0n1p4
/dev/nvme0n1p4:
Timing cached reads: 20114 MB in 2.00 seconds = 10068.90 MB/sec
Timing buffered disk reads: 3356 MB in 3.00 seconds = 1118.66 MB/sec |
This bugs me. I mean I really don't notice the performance difference, but it seem wrong for ext4 to create such an incredible overhead.
Is this normal? Is this expected? _________________ Some day there will only be free software. |
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Thu May 30, 2019 4:46 pm Post subject: |
|
|
Is the partition correctly aligned? |
|
Back to top |
|
|
tmcca Tux's lil' helper
Joined: 24 May 2019 Posts: 120
|
Posted: Thu May 30, 2019 5:30 pm Post subject: |
|
|
I was going to say same thing make sure it is aligned. Also use fstrim instead of discard on root. You can use discard on boot I think that is correct approach.
How did you partition drive? Did you use parted? |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Thu May 30, 2019 5:33 pm Post subject: |
|
|
Is ext4's lazy inode table zeroing still running? See: 'man mkfs.ext4', option 'lazy_itable_init'. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54254 Location: 56N 3W
|
Posted: Thu May 30, 2019 6:23 pm Post subject: |
|
|
RayDude,
Code: | # hdparm -tT /dev/nvme0n1 | does raw sequential reads from the block device.
The contents of the blocks read are ignored. That is, the read speed returned by does not depend on the filesystem, if any. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2066 Location: San Jose, CA
|
Posted: Thu May 30, 2019 8:30 pm Post subject: |
|
|
Thanks for the quick replies.
I used gparted to partition the disk so the alignment should be correct.
I'll put fstrim on root and see if that makes a difference.
I'll check the lazy itable feature as well. _________________ Some day there will only be free software. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54254 Location: 56N 3W
|
Posted: Thu May 30, 2019 8:51 pm Post subject: |
|
|
RayDude,
fstrim is about erasing used but free space in good time before you want to reuse it.
It will make no difference to the read speed.
Boot from a liveCD and rerun the tests when you are sure the partitions are not in use.
Don't even mount them. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2066 Location: San Jose, CA
|
Posted: Thu May 30, 2019 9:42 pm Post subject: |
|
|
NeddySeagoon wrote: | RayDude,
fstrim is about erasing used but free space in good time before you want to reuse it.
It will make no difference to the read speed.
Boot from a liveCD and rerun the tests when you are sure the partitions are not in use.
Don't even mount them. |
Thanks Neddy, I'll try that. _________________ Some day there will only be free software. |
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6051 Location: Removed by Neddy
|
Posted: Fri May 31, 2019 1:13 pm Post subject: |
|
|
What is the IO scheduler being used? _________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3345 Location: Rasi, Finland
|
Posted: Fri May 31, 2019 4:16 pm Post subject: |
|
|
If you want to test filesystem performance, then use some other tool, like fio for example.
As Neddy said, hdparm "skips" filesystem. You can test disk performance with hdparm or (apparently) partition performance. As to why the partition performance is that much slower on an SSD, I have no clue. It would make sense if it was HDD you're testing...
Maybe it's about the IO scheduler as Naib was questioning.
I want to see how this ends up... _________________ ..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
Naib Watchman
Joined: 21 May 2004 Posts: 6051 Location: Removed by Neddy
|
Posted: Fri May 31, 2019 4:20 pm Post subject: |
|
|
also note that hdparm expects pata/sata type devices, nvme is not that so it might mis-report. nvme-tools provides means to do block reads _________________
Quote: | Removed by Chiitoo |
|
|
Back to top |
|
|
Pearlseattle Apprentice
Joined: 04 Oct 2007 Posts: 162 Location: Switzerland
|
Posted: Fri May 31, 2019 10:44 pm Post subject: Re: How come EXT4 slows my ssd so much? |
|
|
RayDude wrote: | Code: | server /mnt/backup/root # hdparm -tT /dev/nvme0n1
/dev/nvme0n1:
Timing cached reads: 22260 MB in 2.00 seconds = 11144.34 MB/sec
Timing buffered disk reads: 8146 MB in 3.00 seconds = 2715.05 MB/sec
server /mnt/backup/root # hdparm -tT /dev/nvme0n1p4
/dev/nvme0n1p4:
Timing cached reads: 20114 MB in 2.00 seconds = 10068.90 MB/sec
Timing buffered disk reads: 3356 MB in 3.00 seconds = 1118.66 MB/sec |
This bugs me. I mean I really don't notice the performance difference, but it seem wrong for ext4 to create such an incredible overhead.
Is this normal? Is this expected? |
I thought that the tests done by hdparm did not involve at all the specific filesystem used for the partition? |
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21639
|
Posted: Sat Jun 01, 2019 12:39 am Post subject: |
|
|
That is what NeddySeagoon and Zucca both said, yes. The hdparm tests should be usable even on a device with no filesystem at all.
RayDude: please post the actual alignment so we can review whether the alignment is correct. The smartctl -a output could also be interesting. Hide any identifying data (such as serial numbers). We only need general model information. |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2066 Location: San Jose, CA
|
Posted: Sat Jun 01, 2019 12:42 am Post subject: |
|
|
Update: I ran hdparm from a system-restore boot flash on an unmounted /dev/nvmen0p4 and got the same results.
Thanks for telling me about fio, I'll try it.
I just checked and my kernel is configured for no IO Scheduler. How is that possible?
There are three choices: MQ deadline, Kyber, and BFQ. Which should I select?
What does it use if none is selected. I seriously wonder how I did this...
Update: none is apparently good for NVME: https://wiki.ubuntu.com/Kernel/Reference/IOSchedulers
Edit: since I'm using a raid6 array, it looks like I should use deadline... _________________ Some day there will only be free software. |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
|
Back to top |
|
|
Pearlseattle Apprentice
Joined: 04 Oct 2007 Posts: 162 Location: Switzerland
|
Posted: Sat Jun 01, 2019 9:36 pm Post subject: |
|
|
Quote: | Edit: since I'm using a raid6 array, it looks like I should use deadline... |
What do you mean RayDude? I think that you previously posted tests done directy against an nvme device and not against a raid... . |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2066 Location: San Jose, CA
|
Posted: Sun Jun 02, 2019 5:03 pm Post subject: |
|
|
Pearlseattle wrote: | Quote: | Edit: since I'm using a raid6 array, it looks like I should use deadline... |
What do you mean RayDude? I think that you previously posted tests done directy against an nvme device and not against a raid... . |
The system boots off an NVME, but has a RAID6 arrary. To optimize the kernel for both the NVME and the RAID6 array it's best for me to use a deadline I/O scheduler. deadline doesn't slow the NVME much, but it improves the performance of the HD ARRAY. _________________ Some day there will only be free software. |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2066 Location: San Jose, CA
|
Posted: Sun Jun 02, 2019 5:09 pm Post subject: |
|
|
Here's the partition table, according to parted:
Code: | server ~ # parted /dev/nvme0n1
GNU Parted 3.2
Using /dev/nvme0n1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Model: Unknown (unknown)
Disk /dev/nvme0n1: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 3146kB 2097kB BIOSBOOT bios_grub
2 3146kB 213MB 210MB fat16 EFI msftdata
3 213MB 8803MB 8590MB linux-swap(v1) SWAP
4 8803MB 1000GB 991GB ext4 SERVER |
Here's smarctl -a:
Code: | server ~ # smartctl -a /dev/nvme0n1
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.1.5-gentoo] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: CT1000P1SSD8
Serial Number: XXXXXXXXXXX
Firmware Version: P3CR010
PCI Vendor/Subsystem ID: 0xc0a9
IEEE OUI Identifier: 0x000000
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Sun Jun 2 10:06:43 2019 PDT
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0016): Format Frmw_DL Self_Test
Optional NVM Commands (0x005e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 70 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 5 5
1 + 4.60W - - 1 1 1 1 30 30
2 + 3.80W - - 2 2 2 2 30 30
3 - 0.0500W - - 3 3 3 3 1000 1000
4 - 0.0040W - - 4 4 4 4 6000 8000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 2,925,784 [1.49 TB]
Data Units Written: 3,735,578 [1.91 TB]
Host Read Commands: 16,841,519
Host Write Commands: 25,212,969
Controller Busy Time: 844
Power Cycles: 12
Power On Hours: 198
Unsafe Shutdowns: 2
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 41 Celsius
Temperature Sensor 2: 39 Celsius
Temperature Sensor 5: 59 Celsius
Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged |
Thanks for your help, everyone! _________________ Some day there will only be free software. |
|
Back to top |
|
|
molletts Tux's lil' helper
Joined: 16 Feb 2013 Posts: 119
|
Posted: Mon Jun 03, 2019 12:58 pm Post subject: |
|
|
RayDude wrote: | The system boots off an NVME, but has a RAID6 arrary. To optimize the kernel for both the NVME and the RAID6 array it's best for me to use a deadline I/O scheduler. deadline doesn't slow the NVME much, but it improves the performance of the HD ARRAY. |
You can use different schedulers on different devices if you like.
Put a line like this into /etc/udev/rules.d/10-ioscheduler.rules:
Code: | ACTION=="add|change", KERNEL=="nvme*", ATTR{queue/scheduler}="none" |
and the system should automatically use the noop scheduler for all NVMe devices and whatever you select as the default scheduler (e.g. deadline) for all other devices.
You can check which is being used for each device with something like:
Code: | cat /sys/block/nvme0n1/queue/scheduler |
substituting the device name as appropriate. It will show a list of available schedulers with the selected one bracketed.
(If you want to try out different schedulers, you can also echo the name of a scheduler that is available in your kernel to the file to change it on the fly.)
Hope this helps,
Stephen |
|
Back to top |
|
|
Anon-E-moose Watchman
Joined: 23 May 2008 Posts: 6098 Location: Dallas area
|
Posted: Mon Jun 03, 2019 3:59 pm Post subject: |
|
|
hdparm works on devices, not partitions, and (I don't think) arrays.
if you want file system performance, then something like iozone would be more what you need.
Edit to add: not sure why there's a performance difference in your first post, it should make no difference whether you point to whole device or a partition of it, it still uses the whole device, because it talks to the controller (if I'm not mistaken)
https://ssd.userbenchmark.com/SpeedTest/607339/CT1000P1SSD8
If running on a NVMe/PCIe Gen3 x4 slot then the device is supposed to hit ~2000 for reads and ~1700 for writes.
if it's not a gen3 slot then it will be slower, especially if that slot is shared with other cards, which is common on many motherboards. _________________ PRIME x570-pro, 3700x, 6.1 zen kernel
gcc 13, profile 17.0 (custom bare multilib), openrc, wayland |
|
Back to top |
|
|
Hu Moderator
Joined: 06 Mar 2007 Posts: 21639
|
Posted: Tue Jun 04, 2019 1:03 am Post subject: |
|
|
Please post the partition table without rounding. sgdisk --print can do this. The parted output is not clear whether the partitions are aligned to any of the commonly important boundaries. |
|
Back to top |
|
|
RayDude Advocate
Joined: 29 May 2004 Posts: 2066 Location: San Jose, CA
|
Posted: Sat Jun 08, 2019 3:51 pm Post subject: |
|
|
Hu wrote: | Please post the partition table without rounding. sgdisk --print can do this. The parted output is not clear whether the partitions are aligned to any of the commonly important boundaries. |
update: found it:
Code: | server ~ # sgdisk --print /dev/nvme0n1
Disk /dev/nvme0n1: 1953525168 sectors, 931.5 GiB
Model: CT1000P1SSD8
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 1A547616-F8A0-485F-B15F-B6723E76FF7C
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 3437 sectors (1.7 MiB)
Number Start (sector) End (sector) Size Code Name
1 2048 6143 2.0 MiB EF02 BIOSBOOT
2 6144 415743 200.0 MiB 0700 EFI
3 415744 17192959 8.0 GiB 8200 SWAP
4 17192960 1953523711 923.3 GiB 8300 SERVER
|
I can't find sgdisk...
How about this output from fdisk:
Code: | server ~ # fdisk /dev/nvme0n1
Welcome to fdisk (util-linux 2.33.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): p
Disk /dev/nvme0n1: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000P1SSD8
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 1A547616-F8A0-485F-B15F-B6723E76FF7C
Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 6143 4096 2M BIOS boot
/dev/nvme0n1p2 6144 415743 409600 200M Microsoft basic data
/dev/nvme0n1p3 415744 17192959 16777216 8G Linux swap
/dev/nvme0n1p4 17192960 1953523711 1936330752 923.3G Linux filesystem |
_________________ Some day there will only be free software. |
|
Back to top |
|
|
|