I use ProxMox in my home lab which has been really helpful to spin up VMs as needed for tests and experiments.

I recently ran into an issue with the LVM thin pool used by ProxMox. The metadata space was completely full. The metadata space reported by lvs -a was 99.99%. 

After a quick search, I noticed I was not the first one running into this. It seems some felt the default pool size in LVM2 was not large enough:

I came up with steps to fix the issue starting with a resize the metadata space:

root@pve1:/# lvresize --poolmetadatasize +1G pve/data

Although lvs -a showed the additional space, I was still experiencing issues and I assumed the metadata was corrupted so I tried:

root@pve1:/# lvconvert --repair pve/data

This did not resolve the issue. Since the root of the tree was lost prior, lvconvert --repair was not able to recover anything and I was left with no metadeta and none of the thin volumes were available. lvs -a was still showing the thin volumes but they remain unavailable:

root@pve1:/# lvchange -ay pve/vm-100-disk-5
device-mapper: reload ioctl on failed: No data available

I tried to running vgmknodes -vvv pve but noticed those volumes got marked NODE_DEL:

Processing LV vm-100-disk-5 in VG pve.
   dm mknodes pve-vm--100--disk--5  NF   [16384] (*1)
   pve-vm--100--disk--5: Stacking NODE_DEL
   Syncing device names
   pve-vm--100--disk--5: Processing NODE_DEL

I reached out to Zdenek Kabelac and Ming-Hung Tsai who are both extremely knowledgeable with LVM thin-pools and they both provided much needed and very useful assistance. Following advice from Ming-Hung, I grabbed the source code of thin-provisioning-tools from GitHub. To properly compile in ProxMox I had to add a number of tools:

apt-get install git
apt-get install autoconf
apt-get install g++
apt-get install libexpat
apt-get install libexpat1-dev
apt-get install libexpat1
apt-get install libaio-dev libaio1
apt-get install libboost1.55-all-dev
apt-get install make

Using this new set of tools, I started poking around with thin_check, thin_scan and thin_ll_dump:

root@pve1:/# ./pdata_tools thin_check /dev/mapper/pve-data_meta2
examining superblock
examining devices tree
examining mapping tree
  missing all mappings for devices: [0, -]
    bad checksum in btree node (block 688)
root@pve1:/# ./pdata_tools thin_scan /dev/mapper/pve-data_meta2 -o /tmp/thin_scan_meta2.xml
root@pve1:/# ./pdata_tools thin_ll_dump /dev/mapper/pve-data_meta2 -o /tmp/thin_ll_dump_meta2.xml

pve_data_meta2 was the oldest backup of the metadata created by the lvconvert --repair and was the most likely to contain my metadata. But the thin_check showed the all mappings were missing because the root was missing.

To fix this with thin_ll_restore, I needed to find the correct nodes. In the thin_ll_dump meta dump created above, I was able to find the data-mapping-root:

root@pve1:/# grep "key_begin=\"5\" key_end=\"8\"" /tmp/thin_ll_dump_meta2.xml
  <node blocknr="6235" flags="2" key_begin="5" key_end="8" nr_entries="4" value_size="8"/>
  <node blocknr="20478" flags="2" key_begin="5" key_end="8" nr_entries="4" value_size="24"/>

In the thin_scan xml file created above, I was able to find the device-details-root:

root@pve1:# grep value_size=\"24\" /tmp/thin_scan_meta2.xml
<single_block type="btree_leaf" location="20477" blocknr="20477" ref_count="0" is_valid="1" value_size="24"/>
<single_block type="btree_leaf" location="20478" blocknr="20478" ref_count="1" is_valid="1" value_size="24"/>

I used the 6235 and 20477 pair to start which produced good metadata and much fewer orphans than before:

root@pve1:/# ./pdata_tools thin_ll_dump /dev/mapper/pve-data_meta2 --device-details-root=20477 --data-mapping-root=6235 -o /tmp/thin_ll_dump2.xml

root@pve1:/# ./pdata_tools thin_ll_dump /tmp/tmeta.bin --device-details-root=20478 --data-mapping-root=6235
<superblock blocknr="0" data_mapping_root="6235" device_details_root="20478">
  <device dev_id="5">
    <node blocknr="7563" flags="1" key_begin="0" key_end="708527" nr_entries="6" value_size="8"/>
  </device>
  <device dev_id="6">
    <node blocknr="171" flags="1" key_begin="0" key_end="799665" nr_entries="51" value_size="8"/>
  </device>
  <device dev_id="7">
    <node blocknr="20413" flags="1" key_begin="0" key_end="1064487" nr_entries="68" value_size="8"/>
  </device>
  <device dev_id="8">
    <node blocknr="19658" flags="1" key_begin="0" key_end="920291" nr_entries="17" value_size="8"/>
  </device>
</superblock>
<orphans>
  <node blocknr="564" flags="2" key_begin="0" key_end="0" nr_entries="0" value_size="8"/>
  <node blocknr="677" flags="1" key_begin="0" key_end="1848" nr_entries="23" value_size="8"/>
  <node blocknr="2607" flags="1" key_begin="0" key_end="708527" nr_entries="6" value_size="8"/>
  <node blocknr="20477" flags="2" key_begin="5" key_end="8" nr_entries="4" value_size="24"/>
  <node blocknr="3020" flags="1" key_begin="370869" key_end="600885" nr_entries="161" value_size="8"/>
  <node blocknr="20472" flags="2" key_begin="379123" key_end="379268" nr_entries="126" value_size="8"/>
  <node blocknr="20476" flags="2" key_begin="379269" key_end="401330" nr_entries="127" value_size="8"/>
</orphans>

Armed with this modified XML file and after making sure nothing was active and using the thin pool metadata, I was able to attempt a restore:

root@pve1:/# dmsetup remove pve-data-tpool
root@pve1:/# dmsetup remove pve-data_tdata
root@pve1:/# ./pdata_tools thin_ll_restore -i /tmp/thin_ll_dump_meta2_root_6235.xml -E /tmp/tmeta.bin -o /dev/mapper/pve-data_tmeta

Following the restore, my thin volumes ALL came back and I was able to activate every single volume.

I learned a lot about LVM thin pool in the process AND learned to be more careful with metadata space. ProxMox creates a very small space by default and when deploying a new server, metadatapoolsize should always be increased (or checked and monitored at the very least).


Also published on Medium.