Migrating Linux sever with 200+ MILLION FILES


 

This article describes steps to resolve  migrating Ubuntu 16 server with 200+ millions of files

 

Note – This issue was seen by the Customer is OCI – However, the issue is more with the ORIGIN/TARGET configuration than specifically with any Cloud Provider.


Background:


When migrating the Linux server (Ubuntu 16 in this case ) with 200+ million files we ran in to below issues 


Issue #1

++++++


rsync  run out of memory  error , this is due to rsync is trying to buffer all the 200+ millions file and trying to copy this takes long time and will run out of memory after some time . To resolve this issue we need to apply RMM patch( Please find the patch and details instruction how to apply in Before you begin section ) to instruct rsync to do incremental copy instead of bulk copy


May 12 01:35:10 m360-transfer-london rmm:00000000000159ca:00007f4999de7700:NOTICE:host_sync.cpp  : 3351: |    opt string   details                = sync failed: rsync error: error allocating core memory buffers (code 22) at util2.c(106) [sender=3.1.2] (22)


Issue #2

+++++++


After applying the patch to RMM to make the rsync to perform incremental copy we still run in to below error , After investigating found that RC of the issue is due to limitation in ext4 file system Please find the more details in the RC of the issue section

symlink "/mnt/rackware/tmp.pQP7aQ/journey_files/216468/proxy_symlinks/2020-04-12/ada858350c0945945655b7b30e9c6b03.css" -> "/mnt/data1/journey_files/216468/proxy_assets/2020-04-12/e98a43397ee3816953d36a209045045f.css" failed: No space left on device (28)




RC of the Issue:


The issue here is target is  hitting internal limits within the ext4 file system, and it's behavior is a little unpredictable when it gets into these extreme cases. Specific file names will randomly be impossible to create when this happens. XFS is a better file system type for this kind of application that creates a large number of files, especially if there are individual directories that contain a large number of files (which is what they seem to have).

So I think the only solution we can offer here is going to be to make the target file system XFS instead of ext4.The sync they are running is bound to fail with the same error due to this ext4 limitation.It's not a bug in RMM software or in the kernel, but a limitation of the file system ext4 itself.


Also please find the attached Document  with detailed explanation of what is the exact issue and why we need to switched to xfs file system 




Before you Begin:


Apply the patch ( rt-21522-v7.4.0.583.patch) attached to the RMM  this will resolve the issue #1 mentioned above


 Please find the below instruction and patch attached

Steps to apply patch 
+++++++++++++++++

1) Copy the rt-21522-v7.4.0.583.patch to /opt/rackware  in rmm

2) cd /opt/rackware
3) patch -p1 < rt-21522-v7.4.0.583.patch



Steps to make the patch to work only on the server with less  memory
+++++++++++++++++++++++++++++++++++++++++++++++++++++

1) Create  /opt/rackware/utils/common/incscan.txt 
2) Put the target ip address in the above file 
3) Multiple server syncs can be selected by putting one target  IP address per line in incscan.txt

Note : 
The sync progress for the targeted server will be inaccurate 


 

Use case/Applicable To:  



Migrating Linux server with 200+ millions file with ext4 file system


Preparation/Pre Req’s:


1) Apply the patch (rt-21522-v7.4.0.583.patch)  to the RMM 
2) Perform the AP and create target with no-transfer flag 

3) After the AP completed configure the  /opt/rackware/utils/common/incscan.txt  file with the target IP

4) Run the sync without no-transfer flag 

5) let the sync create the ext4 file system for all the volume on target  and wait until sync started doing data transfer for the file system with millions of file
6) As soon as it is in data transfer stage  Stop the wave 
7) Reconfigure the target server  file system with huge number of files  from ext4 to xfs  



Steps to convert from ext4 to xfs 



1. ssh root@<target ip>

2. lsblk   --> this cmd to find the  device of large file system with 200+ million files

3. blkid  --> this cmd to find the UUID of the device of large file system with 200+ million files

4. wipefs -a /dev/sdb1  --> here the /dev/sdb1  is the device with large file system with 200+ million files

5. mkfs.xfs -b size=4096 -I maxpct=3 /dev/sdb1

6. xfs_admin -U <UUID of Drive> /dev/sdb1

7) start the wave 

 




Contact:


Any issues or you need assistance with, please contact Support@RackwareInc.com