Headlines

Repair a thin pool

I use ProxMox in my home lab which has been really helpful to spin up VMs as needed for tests and experiments.

I recently ran into an issue with the LVM thin pool used by ProxMox. The metadata space was completely full. The metadata space reported by lvs -a was 99.99%. 

After a quick search, I noticed I was not the first one running into this. It seems some felt the default pool size in LVM2 was not large enough:

I came up with steps to fix the issue starting with a resize the metadata space:

root@pve1:/# lvresize --poolmetadatasize +1G pve/data

Although lvs -a showed the additional space, I was still experiencing issues and I assumed the metadata was corrupted so I tried:

root@pve1:/# lvconvert --repair pve/data

This did not resolve the issue. Since the root of the tree was lost prior, lvconvert --repair was not able to recover anything and I was left with no metadeta and none of the thin volumes were available. lvs -a was still showing the thin volumes but they remain unavailable:

root@pve1:/# lvchange -ay pve/vm-100-disk-5
device-mapper: reload ioctl on failed: No data available

I tried to running vgmknodes -vvv pve but noticed those volumes got marked NODE_DEL:

Processing LV vm-100-disk-5 in VG pve.
   dm mknodes pve-vm--100--disk--5  NF   [16384] (*1)
   pve-vm--100--disk--5: Stacking NODE_DEL
   Syncing device names
   pve-vm--100--disk--5: Processing NODE_DEL

I reached out to Zdenek Kabelac and Ming-Hung Tsai who are both extremely knowledgeable with LVM thin-pools and they both provided much needed and very useful assistance. Following advice from Ming-Hung, I grabbed the source code of thin-provisioning-tools from GitHub. To properly compile in ProxMox I had to add a number of tools:

apt-get install git
apt-get install autoconf
apt-get install g++
apt-get install libexpat
apt-get install libexpat1-dev
apt-get install libexpat1
apt-get install libaio-dev libaio1
apt-get install libboost1.55-all-dev
apt-get install make

Using this new set of tools, I started poking around with thin_check, thin_scan and thin_ll_dump:

root@pve1:/# ./pdata_tools thin_check /dev/mapper/pve-data_meta2
examining superblock
examining devices tree
examining mapping tree
  missing all mappings for devices: [0, -]
    bad checksum in btree node (block 688)
root@pve1:/# ./pdata_tools thin_scan /dev/mapper/pve-data_meta2 -o /tmp/thin_scan_meta2.xml
root@pve1:/# ./pdata_tools thin_ll_dump /dev/mapper/pve-data_meta2 -o /tmp/thin_ll_dump_meta2.xml

pve_data_meta2 was the oldest backup of the metadata created by the lvconvert --repair and was the most likely to contain my metadata. But the thin_check showed the all mappings were missing because the root was missing.

To fix this with thin_ll_restore, I needed to find the correct nodes. In the thin_ll_dump meta dump created above, I was able to find the data-mapping-root:

root@pve1:/# grep "key_begin=\"5\" key_end=\"8\"" /tmp/thin_ll_dump_meta2.xml
  <node blocknr="6235" flags="2" key_begin="5" key_end="8" nr_entries="4" value_size="8"/>
  <node blocknr="20478" flags="2" key_begin="5" key_end="8" nr_entries="4" value_size="24"/>

In the thin_scan xml file created above, I was able to find the device-details-root:

root@pve1:# grep value_size=\"24\" /tmp/thin_scan_meta2.xml
<single_block type="btree_leaf" location="20477" blocknr="20477" ref_count="0" is_valid="1" value_size="24"/>
<single_block type="btree_leaf" location="20478" blocknr="20478" ref_count="1" is_valid="1" value_size="24"/>

I used the 6235 and 20477 pair to start which produced good metadata and much fewer orphans than before:

root@pve1:/# ./pdata_tools thin_ll_dump /dev/mapper/pve-data_meta2 --device-details-root=20477 --data-mapping-root=6235 -o /tmp/thin_ll_dump2.xml

root@pve1:/# ./pdata_tools thin_ll_dump /tmp/tmeta.bin --device-details-root=20478 --data-mapping-root=6235
<superblock blocknr="0" data_mapping_root="6235" device_details_root="20478">
  <device dev_id="5">
    <node blocknr="7563" flags="1" key_begin="0" key_end="708527" nr_entries="6" value_size="8"/>
  </device>
  <device dev_id="6">
    <node blocknr="171" flags="1" key_begin="0" key_end="799665" nr_entries="51" value_size="8"/>
  </device>
  <device dev_id="7">
    <node blocknr="20413" flags="1" key_begin="0" key_end="1064487" nr_entries="68" value_size="8"/>
  </device>
  <device dev_id="8">
    <node blocknr="19658" flags="1" key_begin="0" key_end="920291" nr_entries="17" value_size="8"/>
  </device>
</superblock>
<orphans>
  <node blocknr="564" flags="2" key_begin="0" key_end="0" nr_entries="0" value_size="8"/>
  <node blocknr="677" flags="1" key_begin="0" key_end="1848" nr_entries="23" value_size="8"/>
  <node blocknr="2607" flags="1" key_begin="0" key_end="708527" nr_entries="6" value_size="8"/>
  <node blocknr="20477" flags="2" key_begin="5" key_end="8" nr_entries="4" value_size="24"/>
  <node blocknr="3020" flags="1" key_begin="370869" key_end="600885" nr_entries="161" value_size="8"/>
  <node blocknr="20472" flags="2" key_begin="379123" key_end="379268" nr_entries="126" value_size="8"/>
  <node blocknr="20476" flags="2" key_begin="379269" key_end="401330" nr_entries="127" value_size="8"/>
</orphans>

Armed with this modified XML file and after making sure nothing was active and using the thin pool metadata, I was able to attempt a restore:

root@pve1:/# dmsetup remove pve-data-tpool
root@pve1:/# dmsetup remove pve-data_tdata
root@pve1:/# ./pdata_tools thin_ll_restore -i /tmp/thin_ll_dump_meta2_root_6235.xml -E /tmp/tmeta.bin -o /dev/mapper/pve-data_tmeta

Following the restore, my thin volumes ALL came back and I was able to activate every single volume.

I learned a lot about LVM thin pool in the process AND learned to be more careful with metadata space. ProxMox creates a very small space by default and when deploying a new server, metadatapoolsize should always be increased (or checked and monitored at the very least).


Also published on Medium.

9 thoughts on “Repair a thin pool

  1. Thanks for documenting this; it seems very promising for helping me with a similar problem!

    Note for others who find this: I had to compile thin-provisioning-tools with

    ./configure –enable-dev-tools
    make

    Without that, the thin_ll tools weren’t available.

    Charles, could you clarify how you generated your /tmp/tmeta.bin file? I assume it’s a binary dump of your pool’s tmeta, but I’m not sure how to create that. So I’m stuck at trying to run thin_ll_restore, which requires the -E source-metadata input.

    And if’s convenient for you to share any more pointers on how you identified the correct entries for data-mapping-root and device-details-root, that’d be great. My situation may be different than yours: lvconvert –repair allows me to access 75% of my volumes, with only a few orphans shown in thin_ll_dump, But I’d like to experiment with data-mapping-root and device-details-root to see whether I can restore them all.

    Thanks!

    1. /tmp/tmeta.bin was simply a ‘dd’ copy of the metadata partition. I used something like : ‘dd if=/dev/mapper/pve-data_meta of=tmeta.bin bs=4K’ to transfer the binary content in a file.

  2. Charles, it’s really unbelievable! They(Proxmox team) shouldn’t use such a buggy technology at all !!!
    Now I’m experiencing the same issue and I’m going just to format my LVM disk to old plain ext4 and keep images in FS!
    Thank God that in my case that was just a backup VM… unbelievable…

  3. Hello, I met the same problem as you, but I met the problem in the back according to your idea, And the thin-provisioning-tools ‘s version is different, use thin_dump but at this steps I can’t go on, please give me more advice.
    #thin_dump /dev/mapper/volumes-pool_meta0 -o /Temp/thin_dump_meta0.xml
    —Tips:bad checksum in metadata index block
    Thanks!!!

    1. Sorry, I missed some of these comments because of a WordPress issue. /tmp/tmeta.bin was simply a ‘dd’ copy of the metadata partition. I used something like : ‘dd if=/dev/mapper/pve-data_meta of=tmeta.bin bs=4K’ to transfer the binary content in a file.

  4. Hi

    I am trying to install the thin-provisioning-tools, but I have errors in make process :

    [CXX] thin-provisioning/shared_library_emitter.cc
    [LD] bin/pdata_tools
    /usr/bin/ld: cannot find -lz
    collect2: error: ld returned 1 exit status
    make: *** [Makefile:240: bin/pdata_tools] Error 1

    Anybody can help me.
    Thank’s a lot.

Leave a Reply

Your email address will not be published. Required fields are marked *