back to article How do you copy 60m files?

Recently I copied 60 million files from one Windows file server to another. Tools used to move files from system to another are integrated into every operating system, and there are third party options too. The tasks they perform are so common that we tend to ignore their limits. Many systems administrators are guilty of not …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    cygwin + bash + rsync

    You could have setup cygwin with bash and rsync on the windows machines as well. Rsync is better then cp when it comes to moving files between systems.

    1. Anonymous Coward
      Pint

      Right ...

      So you've tried this and it works with 60 million files have you?

      1. Anonymous Coward
        Troll

        Re: Right

        "So you've tried this and it works with 60 million files have you?"

        So you've tried this with 60 million files and it hasn't? Have you?

        1. Anonymous Coward
          Pint

          Nope

          Nope, not tried it, that's why I'm not suggesting it as a solution ... What's your reason?

      2. Joe Montana
        WTF?

        So you've tried this and it works with 60 million files have you?

        I have used rsync extensively, one of the things i commonly use it for is duplicating entire system installs from one system to another... I can do an initial copy while the source system is live to get 99% of the data, and then only need to copy the differences once i have shut down the source system. I have a busy mailserver which uses the maildir format (one file per message) and that had more than 60 million small files on it when i migrated it to new hardware.

        1. Anonymous Coward
          Pint

          You're missing the important point

          You've tried this in cygwin on a windows box have you? And it works?

          No'one is disputing rsync works on linux. The article is about how to copy that many files from a windows box. If someone suggests doing it in cygwin it's a more useful suggestion if they actually know whether it works, rather than leaving someone else to do their work for them.

          In theory any of the other tools discussed n the article should also work. In practice they didn't.

  2. Trygve Henriksen

    Robocopy...

    It has options for 'synchronising' two folders, including file permissions.

    Dump the text output into a file then search for 'ERROR', or better, pass it through FINDSTR to filter out anything but lines beginning with 'ERROR'

    Sure, it bombs out om folder/filenames longer than 256 characters, but those are an abomination and the users who made them(usually by using filenames suggested by MS Office apps) really needs to be punished anyway.

    Been pushing around a few million files this spring and summer...

    1. Steven Hunter
      FAIL

      Wrong...

      Robocopy DOES support paths longer than 256 characters... In fact there's a flag ( /256 ) that *disables* "very long path (> 256 characters) support".

    2. Velv Silver badge

      robocopy /create

      robocopy also has the /create option which copies the files with 0 data. OK, a total operation take 2+ passes, but it has several advantages :

      1. 60million files WILL cause the MFT to expand. If you are filling the disk with data as well as MFT then you can end up fragenting the MFT which can lead to performance problems. If the disk is only writing MFT (such as during a create), then the MFT can expand to adjacent space.

      2. Since there is no data, the operation completes is a fraction of the time, so if you log the operation, you can see where any failures will occur, and fix them for the final copy.

      For planned migrations, you can run the same job several times (only run the /create the first time), and therefore the final sync should take a lot less time :)

  3. The BigYin

    Or...

    ...use Cygwin? Gives a lot of the *nix style commands and abilities right within Windows.

    If one is going to go to the bother of having a Linux box sitting about just to show the Windows severs how it's done, one has to question why one even has the Windows serves in the first place. :-)

    1. Anonymous Coward
      Pint

      Hmmm ...

      Perhaps it's because the windows servers aren't used simply to copy files?

      And you've tried copying 60 million files in cygwin have you? How did that go?

      1. The BigYin
        Flame

        Get a grip

        I asked is "cygwin" had been considered, I did not say "Use cygwin! It's the wins! L0LZ!!11!" The author had looked at various tools and not mentioned "cygwin", so my question seems perfectly reasonable to me (others have asked the same question).

        You seem to have taken umbrage at a few "cygwin" related posts, do you have an issue with this tool (I used it for some light-weight ssh and rsync work and really like it). If you do, have you filled bugs, got involved?

        Or do you know something about "cygwin" and large jobs? "Oh, you can't use cygwin for that because it's job index will overflow, see bug-1234".

        Or are you just some reactionary pillock who can't see through the Windows? "!Microsoft==Bad"

        I know which conclusion I am drawing at the moment...

        1. Anonymous Coward
          Pint

          Right

          You're jumping to the wrong conclusions. The point is simply that before trying you'd expect half the tools tried in the article to work on windows. However, they didn't. So it's helpful to make clear whether you know the solution you're suggesting works or not. It sounds like you don't. Also, you didn't ask whether cygwin had been considered. Re-read what you posted. It's nothing to do with windows / linux / os of choice. I work mostly on bsd and linux derivatives for a living. Oh, and i like cygwin.

    2. Daniel B.
      Boffin

      SFU

      Easier: Use Services For Unix, that one uses an Interix subsystem to run all your UNIXy stuff.

  4. Russell Howe

    Fastest way?

    Pull out hard drive.

    Move hard drive to other server.

    Put in hard drive.

    Or backup to tape and then restore.

    Using something like tar or rsync may well have been better than cp

    1. Anonymous Coward
      Anonymous Coward

      The title is required, and must contain letters and/or digits.

      All well and good until you have a RAID array

      1. Anonymous Coward
        Thumb Up

        Agreed.

        And you do know the price of a tape drive, right? Company I worked for bought a pair of Ultrium-4 drives from IBM. The bloody thing itself costs a little over US$3000 a pop, and you need two. If you work in a company where accounts is a /b/tard, you'll know how painful it is to get them to approve the upgrade for a drive, let alone two drives.

    2. Pete 2

      and some added bonuses ...

      +1 This has the most desirable side effect of stopping any bugger from trying to update the "wrong" file, or creating new ones while the copy is in progress.

      p.s. Don't forget the second part of any professional data copying activity is to VERIFY that what you copied actually did turn out to be the same as what you copied from. Many a backup has turned out to be just a blank tape and an error message without this stage.

    3. Trevor_Pott Gold badge

      @Russell Howe

      Did I mention that the systems were both live and in use during the copy?

      Also, RAID card on the destination server was full.

    4. Velv Silver badge
      FAIL

      , err, how do you do that with a SAN or NAS

      Clearly you've never worked in a decent sized enterprise.

      There are LOTS of scenario's when you can't just move the disks or use backup hardware. Maybe not all the data on one disk is being moved. Maybe the data is on different SANs or NAS and can't be swapped or zoned. maybe you can't attach the same type of tape to each server.

  5. Anonymous Coward
    Happy

    Most obvious conclusion ever

    "So the best way to move 60 million files from one Windows server to another turns out to be: use Linux."

    D'oh? On a serious note, you should also give rsync a go. And try also _storing_ the data on an OS that is not windows, you might find that you are then able to do things you need to do, like say copy 60m files, without resorting to booting virtual machines of another OS just to use that OS's utilities to manage your main OS. Just saying. Please don't flame me.

  6. zaax
    Thumb Up

    xcopy

    how about a straght cmd line like xcopy?

  7. Anomalous Cowturd
    Linux

    You're learning, Trevor!

    > So the best way to move 60 million files from one Windows server to another turns out to be: use Linux.

    It's only a matter of time before you see the light!

    Tux. Obviously.

  8. ZapB
    Pint

    Easy

    screen -R big-copy

    rsync -avzPe ssh user1@box1:/src-path user2@box2:/dst-path

    <Ctrl>-<a>-<d>

    Go for a beer or 500 ;-)

    1. Ammaross Danan
      FAIL

      Wrong

      Not easy once you try doing a file transfer via rsync through an ssh tunnel, like your suggesting, but the destination server isn't running an ssh server....let alone use / as a path convention.

      1. Nigel 11
        Go

        Wrong ... you mean, it's running Windoze.

        "Not easy once you try doing a file transfer via rsync through an ssh tunnel, like your suggesting, but the destination server isn't running an ssh server....let alone use / as a path convention."

        Well, if the target system is running LInux, then turn on its sshd service!

        If it's running Windoze ... well, borrow another PC, boot an appropriate Linux live CD, mount the MS Shared folder as CIFS, start the ssh daemon, and rsync through the temporary Linux system to the MS system.

        Linux: the system that provides answers and encourages creativity.

        Windoze: the system that erects obstacles and encourages stupidity.

  9. Psymon

    large file transfers are always a challenge

    Personally, I swear by Directory Opus, by GPsoftware.

    I can attest to it's incredibly reliable performance, error handling and insanely flexible advanced features.

    Aside from being able to copy vast quantities of data, handle errors, log all actions, migrate NTFS properties, automatically unprotect restricted files and re-copy files if the source is modified, it also has built-in FTP, an advanced synchronisation feature (useful for mopping up failed files after you've fixed the problem that stopped them being copied), and a truly unparralelled batch renaming system which among other things, can use Regular Expressions.

    It also has tabbed browsing (you can save groups of tabs), duplicate file finding, built in Zip management, custom toolbar command creation, file/folder listing and printing....

    Stangely, not a lot of sysadmins know about DOpus. I learnt of it during my Amiga days, in what seems like a lifetime ago. I always have a copy installed on my workstation, and at least 1 of my servers

    1. Alan Bourke
      Unhappy

      Directory Opus is great ...

      ... but I doubt it would copy 60 million files.

  10. blah 5

    I second (third? fourth?) the vote for cygwin.

    Rsync is your friend, especially when you've got the ssh tools loaded so you can use scp. :)

    Come to the light, Trevor! :)

  11. Gary F
    Thumb Up

    Good story - quite an eye opener

    I like it! I'm often frustrated dealing with 10,000 small files in Windows, never mind 60m! On a desktop Windows 7 is painfully slow displaying 2000 photos in a single folder. It shows them straight away (detailed view, not thumbs) but then takes 20 secs to sort them by date modified! Aah!

    But why do you have 60m files? Could you store that data in a better way? Could it be put into a database for example?

    1. Anonymous Coward
      Anonymous Coward

      why have 2000 in a single folder

      surely you could come up with some kind of organisation, holidays, family etc. The file system is hierarchical for a reason!

      are you one of those people who has hundreds of files dropped directly into the root of c: too?

      1. Anonymous Coward
        Anonymous Coward

        desktop

        Or cluttered the desktop with icons and icons and more icons?

      2. Anonymous Coward
        Anonymous Coward

        Why? Because.

        Are you one of those people who wastes your life endlessly taxonomising?

        And hierarchical taxonomies are largely at odds with reality. Here is a photo of an interesting building I saw on holiday. Does it go in the "building" folder, or the "holiday" folder?

        1. Ezekiel Hendrickson
          Boffin

          Reply to post: Why? Because.

          > Here is a photo of an interesting building I saw on holiday.

          > Does it go in the "building" folder, or the "holiday" folder?

          Nah. It goes in the 2010-08-24 folder, tagged with 'holiday' and 'architecture'

      3. Annihilator
        Alert

        2000 in one folder

        Because 2000 can quite easily fit on one memory card these days, and a long enough holiday can also generate that many.

    2. Ammaross Danan
      FAIL

      Database?

      Store 60m files in a database? SharePoint perhaps? Not quite as easy to access/control/backup as an NTFS storage tree. Sorry.

  12. Anonymous Coward
    Boffin

    Hmm

    From Windows to Windows I'd use robocopy.

    For Unix to Unix use rsync.

    Both of these are very fast and support all sorts of failures where a copy is restarted (the existing files are skipped).

    1. Trevor_Pott Gold badge

      @AC

      From the article:

      I wanted to give several command-line tools a go as well. XCopy and Robocopy most likely would have been able to handle the file volume but - like Windows Explorer - they are bound by the fact that NTFS can store files with longer names and greater path than CMD can handle. I tried ever more complicated batch files, with various loops in them, in an attempt to deal with the path depth issues. I failed.

      1. Anonymous Coward
        Anonymous Coward

        I've used Robocopy for 60m files

        Many times, it is fast and the /MIR option is awesome since a failure for any bizarre under the hood reason can be left and corrected at the end. Also if users are updating files whilst you copy, the final run picks up all those changes.

      2. frymaster

        robocopy can handle large filenames

        ....which makes me very interested in what he was doing, and why robocopy wasn't an option

        I've managed to use robocopy to create files I couldn't delete from windows before (because I'd gone from c:\ to d:\somefolder\someotherfolder and that pushed the bottom of the folders past the filepath limit)

  13. Colin Miller

    rsync

    I'd be tempted to use rsync - its fallover behaviour is more reliable that cp's.

    It can copy over ssh (or rsh), its own network protocol, as well as between local directories (or NFS / Samba mounted shares).

  14. Tone
    FAIL

    Robocopy

    Ability to copy file and folder names exceeding 256 characters — up to a theoretical limit of 32,000 characters — without errors

  15. Anonymous Coward
    Anonymous Coward

    Its all about preferences?

    These things tend to end up as personal preferences so whether Linux or Windows my favourite tool is rsync and if I'm copying more than a single file within a Linux box I choose it over using cp any day.

  16. mego

    Richcopy

    I use a tool called Richcopy. It can do multiple thread copying (copying a couple files at a time, supports x number of retries, and gives you a handy readout at the end on what files failed/had an issue/etc. It also has a compare feature to compare the two locations once you're done.

    1. Daniel B.
      Happy

      Yes!

      You did read the article, did you? Richcopy is not only mentioned, it was the only tool that worked under Windows for Trevor.

  17. slack

    LOL

    "Richcopy can handle large quantities of files, but can multi-thread the copy, and so is several hours faster than using a Linux server as an intermediary"

    Richcopy is not so fast that there is time to defragment an NTFS partition with 60 million files on it before CP would have finished."

    Wut?

    1. Trevor_Pott Gold badge

      @Slack

      Richcopy copies more than one file a time.

      If your destination is a Windows server, then doing this causes massive fragmentation. So the total time to finish the copy is "time to copy" +"time to defrag". The goal is not just to get files from server A to server B, but to get them there in a ashion that ensures that server B is ready for prime time.

    2. Adam Williamson 1

      pretty simple

      pretty simple. richcopy's multi-threading approach is faster but results in a massively fragmented filesystem which you then have to defrag. So in total, the richcopy route is slower, because richcopy+defrag takes longer than cp.

  18. Paul 25

    One word...

    rsync

    I don't know what the state of rsync servers on windows is, but on unix systems it's the one true way for copying files over a network.

    It's fast, handles failure well, doesn't get it's nickers in a twist when doing large recursive copies, and will run over ssl with a bit of work. It can also give you plenty of feedback about progress if you need it.

    The idea of copying that many files using something like a file manager or cp, no matter how good, just fills me with horror.

  19. kevin biswas
    Thumb Up

    Roadkil's Unstoppable Copier

    Is great too.

    http://www.roadkil.net/program.php?ProgramID=29

    1. Simon2
      Thumb Up

      i use that.

      i've used that too and it's good but i wish there was an option to exclude certain files from being copied. e.g. all .tmp, .bak and .chk files and also thumbs.db and desktop.ini.

  20. John G Imrie
    Linux

    No title

    No text.

    <-- Just a Logo

  21. Mike Kamermans

    cygwin?

    Curious - what about using cp from within windows context by using the version that comes with cygwin, rather than using a virtual machine?

  22. Popup
    Boffin

    rsync

    http://en.wikipedia.org/wiki/Rsync

    It's the most versatile copy/mirror/update tool available.

  23. Anonymous Coward
    Anonymous Coward

    Other tools

    I've had some success with synctoy for such tasks, but not with quite that number of files! Would be interested to see how it fares out actually.

    Another way to use linux cp would be via Cygwin, which allows you to access main system drives within a unix-like shell.. also might be interesting to try..

  24. carlos_c

    Backup and restore

    I just use ntbackup to backup to a file and restore to a new destination quick and it should handle that number of files

    1. Trevor_Pott Gold badge

      @Carlos_c

      Yarp: I would handle that.

      Where would I put the .bkf though? The originating server doesn't have a spare 10tb, the destination doesn't have 10tb worth of buffer space, and writing your *.bkf to a network share is madness past about 2tb worth of bkf. (A network hiccough WILL occur, and you WILL loose that backup.)

      That said, for most tasks, WIndows Backup Services serve me just fine.

    2. Michael C
      FAIL

      nope.

      ntbackup is a horrible application with many major shortfalls. It relies on the same interpreter as the cmd shell, and thus has path size limits.

      1. Trevor_Pott Gold badge

        Windows backup services != ntbackup

        It's a little more advanced. Had I been running Server 2008 R2...

  25. adaytay

    What about ycopy?

    I was surprised to see no mention of yCopy at all - this is an absolute gem!

    http://www.ruahine.com/download.html

    Those it can't copy it tells you, but it keeps on working through the list till its all done.

    1. Martin 71 Silver badge
      Pint

      Looks good but...

      the download .exe file is not found.

      Beer, because I need one now

    2. Trevor_Pott Gold badge

      @adaytay

      If it can't copy a file...then I move on to the next tool. "Those it can't copy" can end up being MILLIONS of files in this scenario. Path depth is a b***h...

  26. BBuilder
    Troll

    uhm...

    tar?

  27. Allan George Dyer Silver badge
    Joke

    And if all else fails...

    you can use dd.

    There is one major problem with this approach - you don't get the copying animation.

  28. 4.1.3_U1

    rsync

    Did you try rsync?

    I haven't tried copying umpteen million files on windows with it, but it's part of the default cygwin installation on windows.

    If you must use windows, and you're not using cygwin, you're missing something.

  29. Vladimir Plouzhnikov

    Could be a simpler solution

    Have you ever tried FAR File Manager?

    http://www.farmanager.com/download.php?l=en

    I am still fascinated by people's apparent acceptance of Microsoft's inability to understand that to copy or move files you need TWO panels, not one.

  30. Anonymous Coward
    Anonymous Coward

    That is assuming....

    Assuming you have a Linux host or a copy of Linux to hand, and a installation of VMware, Hyper V or whatever and are allowed to just install another host to your environment without going through lots of hoops.... otherwise it's Richcopy.

    1. purplefloyd
      Pint

      Downvoted...

      Downvoted because free software is never hard to get your hands on.

      <-- Beer, since RichCopy is only free as in...

      I'd also suggest using VirtualBox OSE instead of VMware since it's free as in speech/libre.

  31. Arkasha

    Doh

    I would have just used rsync without even thinking about any of the others. It wouldn't even have wibbled one iota at copying 60m files.

  32. Anonymous Coward
    Alert

    Restore from tape ?

    Beyond a certain point it surely has to be easier to do a restore from tape , not only do you have a backup for future reference but the sequential nature of the process means no fragmenting and without the comparitively slow network in the way a good deal faster.

    1. Trevor_Pott Gold badge

      But...

      ...I has no tape.

      "Backups" occur over our WAN. (100Mbit pipe is perfectly fine for incrementals.) We are fortunate to have an uncapped plan from our provider...

    2. Michael C

      which one?

      Many many backup utils I've tried have failed over 4 million files. worse still, depending on their implementation, I've seen backup systems bring servers to their knees, or even crash them, trying to catalog a large flat folder before starting moving files.

      1. Trevor_Pott Gold badge

        @Michael C

        We use retrospect for backups. (7.611) so far, it has not failled me, no matter how many files I throw at it. Rustling up enough media to do the backups on the other hand...

  33. mikeyboosh
    Thumb Up

    hmmmm

    We just backup and restore to the new location....this keeps permissions and is easy and quick.

  34. Robert Carnegie Silver badge

    @slack it isn't just about copying.

    Well, maybe... Our Hero reckons that Richcopy will leave the destination disk with loads of fragmented files, which he'd then want to undo. So doing the job in Richcopy and then defragmenting the disk would take longer.

    I am not sure that (1) there would be great fragmentation and (2) it would be a pressing issue. You could let users use the new volume in the fragmented state, it'll work, and do your defragmenting later.

    Then again, if you back up filewise, you can simply restore from your backup to the new copy location... unless the process we're discussing IS your backup?

    1. Trevor_Pott Gold badge

      @Robert Carnegie

      Richcopy resulted in 43% volume fragmentation.

      Backups occur using retrospect and file servers located across the WAN. (<3 uncapped links.)

  35. Dave Cable

    Alternative Linux method

    One that's always worked well for me in the past. Preserves permissions and deals gracefully with sparse files, if you have any:

    cd <source>

    find . -print -depth | cpio -pmud <target>

  36. Steven Jones

    Image Copies

    Far and away the fastest way to move 60 million files is to use an image copy. Of course you can't restructure the underlying disk paritions, but it avoids all those tens of millions of file directory operations. You can then use a file sync program, like rsync to get the new and old back in full alignment (assuming that you aren't able to freeze the source file system in the meantime).

    Depending on the total size, network bandwidth between the two servers and physical distance you can transfer the partition image using anything from a USB external drive to a network mount.

  37. Khaptain Silver badge
    Coat

    SAN or NAS anyone

    Wouldn't the obvious solution been to have used a NAS or SAN in the first place. 60 Million files on one server seems to be a little "light" in terms of security/backups/recovery etc.....

    His the one with "I Luv TPB" embroidered on the lapel..

  38. Robert Hill
    FAIL

    Non nonsensical conclusion...

    SO...RIchCopy will do it just as fast if not faster than cp, and if restricted to one thread will not fragment...but your conclusion is that it is still better to install another OS and use cp?

    <boggles mind>

    1. Trevor_Pott Gold badge

      @Robert Hill

      When restricted to one thread, it's actually quite a bit slower than CP

  39. The Unexpected Bill
    Pint

    Well done, Trevor!

    I want to thank you for taking some time to investigate this problem and propose a few solutions! I don't do this often, but I think that filing away a copy of Richcopy would be a good idea.

    I have found it absolutely unbelievable that for all of these years (since Windows 95) Microsoft hasn't had their door beaten down with requests to make Explorer a lot more robust when copying or moving files. Well, maybe they have...who could know?

    I would love to see the ability to bypass an error and move on added to Explorer at some point. Since we're living in a post-Vista era now and the Window GUI shell is now Really Broken to my view, I'm not sure I care.

    Now, if anyone knows what to do about the files created by Adobe Flush and even Internet Explorer in temporary areas that violate the long file name conventions and cannot therefore be whacked in any way I've tried thus far...I'm all ears!

    1. Sooty

      err

      "I would love to see the ability to bypass an error and move on added to Explorer at some point."

      it was added wasn;t it, 4-5 years ago? Vista explorer introduced that very feature, it continues on in 7. It would be really nice if it were retrofitted into xp i admit!

      1. Daniel Evans

        Yes, but

        It always wants to show at least one error popup first (e.g. "Are you sure you want to overwrite this file?" followed by a separate "Are you sure you want to overwrite this folder?"), plus grinds to a halt whilst doing so.

        Unless there's an "ignore all errors" option I've yet to find.

  40. John Ridley 1

    robocopy ftw

    File name length is actually not an issue for robocopy. It's a 32 bit cmd program so it can handle filenames as long as NTFS can handle. I use it for syncing large numbers of files often (though more like a couple million, this is only my personal fileserver and backup).

    1. Ammaross Danan
      FAIL

      Title

      Does no one actually read the full article? It clearly stated robocopy hosed around 4m files.

  41. Anonymous Coward
    Anonymous Coward

    Interesting Article

    Our problems have been moving files from remote hosted server with provider a to remote hosted server with provider b, with internet connections between the two, moving aprox 20GB of data.

    I find that most of the tools don't seem to deal with connection drops that well, or are just generally so slow as to make it impossible.

    I think last time I ended up going with Raring the lot but splitting the RAR's in to <200mb chunks. This made transferring them simpler.

    Never thought of using *nix tools.

    1. Ammaross Danan
      Go

      RSync

      Your problem is solved with RSync (as has been pointed out by many others). RSync is a delta-copying program, which makes successive copies faster/less bandwidth because it only copies changes in files. Great for WAN connections. Not only that, but it has a retry in event of connection loss. If all else fails, you can always restart the transfer and it will make sure all is in sync (in-line verification!).

      Linux has its place in the world. It comes into play when you need to do something that your ACTUAL (usually Windows) servers can't.

  42. Mattyod

    erm...

    ctrl + c then ctrl + v?

    1. Anonymous Coward
      Anonymous Coward

      re: erm...

      Isn't that ctrl + A then ctrl + c then ctrl +v .......

    2. Colin Miller
      Grenade

      Explorer? You've got to be kidding (or trolling)

      > ctrl + c then ctrl + v?

      In Explorer?

      If you try it, with 60 Million files (probably around 6TB of data), explorer will sit there for ages "Preparing to copy" as it calculates the total data size and how long it will take.

      You wander away for lunch (its going to take several hours to do the copy), come back to see an error "Could not copy 'report.doc' ", helpfully NOT printing the source or destination directory. The copy has stopped, and can not be continued.

      Which report.doc file is it talking about? There are a few thousand files of the same name in assorted directories.

      If you figure that out, there is no way to restart the operation with out overwriting any existing files - there is no "Skip all" option on the overwrite confirm dialog box.

      With most of the the other copy tools, they will keep going with the remaining files when an error occurs. You can the review the log and copy the last files by hand. Some tools can check when copying that the destination is the same as the source and not copy - so you can just return the copy command again for the uncopyed files.

  43. Psymon

    hahahaha!

    I love some of the suggestions that appeared between composing and posting my last comment!

    @Russel Howe

    Please tell me you're joking!

    Pull out the hard drive? What, one hard drive containing 60m files? Or even a server, and it has ONE hard drive?!? This isn't the 70s, Rus.

    1) it's at minimum RAID 5 array, quite possibly utilising the raid controller built into the servers motherboard. The raid controller is integral to keeping the data readable

    2) You don't just 'pull out hard drive' on a server. In all likelyhood, that machine is still live, and hosting a miriad of roles and services for the network.

    @Zax

    I'll give you 10/10 for optimism there. Xcopy would have failed just like the other commandline tools Trevor had tried. I've seen many a solution using insanely complex batch and kix scripts fail time and time again. The simple fact of the matter is that in a complex environment such as this, scripted systems invariably fail due to the unexpected and unforeseeable.

    @Zaf

    We have actually had to shift FROM a unix file server, TO a windows system. Over 2 terrabytes, we encountered severe limitations with the filesystem. That being we were regularly exceeding the inode limitations. After several weeks of research, we discovered that this was a fundamental design flaw of the filesystem, which assumed over that size the partition was going to be filled with files greater that 1Gb, not tens of millions of 1Kb files.

    On top of that, Unix has a less advance Kerberos implementation, meaning computer account permisions could not be applied, and the time saving benefits of giving users access to Volume Shadow Copy dynamic restores meant were weren't forever routing through tape backups for every user that accidentally overwrote their word document.

    1. Nexox Enigma

      Weeks, eh?

      """We have actually had to shift FROM a unix file server, TO a windows system. Over 2 terrabytes, we encountered severe limitations with the filesystem. That being we were regularly exceeding the inode limitations. After several weeks of research, we discovered that this was a fundamental design flaw of the filesystem, which assumed over that size the partition was going to be filled with files greater that 1Gb, not tens of millions of 1Kb files."""

      Good thing there's just one filesystem for all *nix systems, and none of them let you tune them for target file size. Oh wait, there are about 6 mature filesystems, and most of them can be tuned to easily cope with the load you describe. Reiser was always pretty good with huge numbers of small files, and I don't believe it even uses inodes.

      And NTFS is nearly the worst 'modern' filesystem that there is, narrowly edging out HFS+.

      1. foxyshadis

        NTFS certainly has some issues, but worst modern fs is pretty harsh.

        When my #1 qualification is compression, I have one choice: NTFS. For some reason *nix folks not only refused to add compression to their filesystems and largely ignored pleas and projects to include it, they ignored and outright handwaved away both theoretical arguments and hard benchmarks proving that compression was almost always a net win on modern systems ten years ago. I know because I made and was involved in some of them. A decade later, with ZFS showing how dedup and compression trounces older fs, we finally get some pushes to include it in the mainstream. That kind of blindness in the pursuit of narrow perfection is what separates Linux development from those that need sales to survive - and this from a regular user of Linux.

  44. Jolyon

    |-o-| <-O-> ( °) 8===8O8===8 |-o-|

    Which of these methods preserves NTFS permissions?

    1. Trevor_Pott Gold badge

      @Jolyon

      Richcopy

  45. Gordan
    WTF?

    Oh FFS!

    1) Learn to use proper tools

    2) Use a functional OS

    tar -cf - * | ssh user@target-host "tar -C /target/path -xvf -"

    If you want it compressed (if your interconnect is slower than your disk arrays), you can instead do:

    tar -zcf - * | ssh user@target-host "tar -C /target/path -zxvf -"

    The limits of this approach are only those of your disk space and file system capabilities. It will also preserver file ownership and other attributes. And it's a one-liner. The fact that you wrote a whole article ranting about a problem that takes a 1-liner to solve is quite fascinating.

    Worse, you don't even need *NIX to do this, you could have done it with cygwin!

    1. jco
      FAIL

      * with 60 million files

      @Gordan

      What shell are you using? What shell is able to do globbing on millions of files?

      Since you, with some arrogance, tell people to learn to use proper tools, at least do mention them...

      Several times I had to convert scripts doing "scp *" to sftp due to command line arguments limitations of shells (and I don't mean just bash).

      JMTC, jco

      1. Nexox Enigma

        ...duh

        """What shell are you using? What shell is able to do globbing on millions of files?"""

        Well there are 2 situations:

        a) You have millions of files in one directory.

        b) They're in many directories.

        If you run into a) just go up one level higher and tar that directory, no globbing required.

        If you hit b) then you'll only glob a few directories, and it'll just work.

        * globs don't recurse, tar takes care of that.

        I've done similar, took ~400GB of 1-10KB files in a complex sea of subdirecories, packaged them into a tar file, sent that over nc, and wrote it directly to an LVM lv, then later reversed the process to copy them back.

      2. Gordan
        WTF?

        Since you asked...

        First of all, of you have 60M files in one directory, you arguably have bigger problems, but this will do just fine as an alternative:

        tar -zcf - . | ssh user@target-host "tar -C /target/path -zxvf -"

      3. Gordan
        FAIL

        Oh, and...

        You do realize that instead of using * shell expansion, you can do more sensible things like:

        #!/bin/bash

        for file in `ls`; do

        scp $file user@host:/path/$file

        done

        No shell filename expansion.

  46. Anonymous Coward
    Anonymous Coward

    rsync is the answer

    but, not over SSH if you want to copy more than 2TB, and yes, we have a lot more than 20million files

  47. Lars Silver badge
    Happy

    Surprised?

    Or are you just polite.

  48. Anonymous Coward
    Boffin

    What about ZFS?

    Could ZFS have handled or helped with such task? Does anybody use that, and could answer that please? Would it be available in the first place?

    Inquiring minds want to know. This is an interesting question.

    If such a filesystem is named Zettabyte File System, and it is entirely responsibility of the FS to handle that kind of action, would it have helped if the entire set of files was stored in ZFS environment? Or is it designed only to handle LARGE files, not necessarily MANY files?

    I am googling Richcopy now, though.

    If I had to move 60m files, I would have just unplugged the HDD with them and plugged them on the destination too, I admit my ignorance.

    I would have tried backup tapes too.

    I would have 7-zipped 1m of files at a time.

    I would have set a HTTP server on the host machine, a HTML of FTP page with the list of files and used Getright copy-all-links feature to copy on the other.

    I would have all of those combined, if it helped.

    1. dannypoo
      Go

      good suggestion

      zfs would have made this too easy and would have allowed the copy to run at almost network wire-speed (all block-level, you see). you could take a filesystem snapshot and send it over the wire (rsh,ssh) and import it at the other end. I have done this more times than I care to count with filesystems containing up to 30 million files. it also actually defragments the files in the process (currently pretty much the only way you can do that with ZFS)

      of course you would have had to copy all the files from the NTFS filesystem to ZFS first though. :facepalm:

      I have set up a system using Solaris 10, samba+winbind to store in excess of 100 million files in a multi-user Active Directory environment on ZFS. All kerberized and shared over the network. It even supports NFSv4 ACLs (requires patching samba a bit though).

      I will be moving this from one server to another next week with a single command. Nice.

  49. Ben Liddicott
    Grenade

    robocopy or NTbackup

    Robocopy to do it over the network - can copy or move.

    NTBACKUP if you need to roll it all up then unroll it at the other end.

  50. DaveDaveDave

    This is a ludicrously inept way to do it

    I can't believe that, having failed to make batch files work, the sysadmin who wrote this article didn't use a Windows scripting language instead of an old DOS one. The task is trivial in any version of VB or even VBA:

    [Pseudocode]

    For each [directory[ in [file structure]

    For each [file] in [directory]

    Copy file to destination

    Next

    Next

    Another few lines to record and/or handle exceptions, and job done.

  51. DaveDaveDave

    Come to think of it...

    Come to think of it, there shouldn't be any reason for an exception during file copying, unless the sysadmin has munged something up bigtime. Explorer FTW.

  52. Jerome 0
    Thumb Up

    Odd conclusion?

    Great article, but surely an odd conclusion. Why would I go to the trouble of setting up a Linux virtual machine, when you already said I can run Richcopy in single-threaded mode and get exactly the same result?

    Of course, if my servers were running Linux in the first place, that would be a different story. Then I'd have to set up a Windows VM to run Richcopy (er, maybe not!)

    1. Trevor_Pott Gold badge

      @Jerome 0

      Because the Linux VM was significantly faster than Richcopy in single-threaded mode.

      Richcopy is faster in multi-threaded mode, but leaves the system fragmented. In single-threaded mode, using the Linux VM was about 20% faster. As to *why* the Linux VM would be that much faster...you got me there. (Maybe because it's not preserving permissions?) In my case I wasn't worried about permissions. I just needed the files moved. (We were resetting permissions on entire trees upon arrival anyways. Domain change and all that...)

      I only tried the Linux VM on a lark: I happened to have a web server which had CIFS mounts pointing to both machines. I figured “what the heck, let’s see what it does.”

      Imagine my surprise ½ hour later when I clocked how far it had gotten down the miserable “many small files” directory. Farther than Rich copy in the same time, I assure you.

      Also: no, I wasn’t running the tests concurrently.

      1. Jerome 0

        Fair enough

        I raised the point only because your article implied (though admittedly did not explicitly state) that cp and Richcopy took the same amount of time:-

        "You could restrict Richcopy to a single thread, but then it is no faster than cp."

        If that were the case, I'm certain the majority of Windows users would rather install a single app to get the job done, rather than set up a Linux VM. Indeed, you'd have to have an awful lot of files to copy for it to be worth saving 20% of the time involved.

        1. Trevor_Pott Gold badge

          @Jerome 0

          ...about 60M or so?

          1. Jerome 0

            Time?

            How long did Linux take to copy them?

            1. Trevor_Pott Gold badge

              Time

              In nice round numbers, Richcopy would have taken about 38 hours. Linux took just under 31.

              BTW, duping a Linux VM, booting it and mounting the drives took less than 2.5 minutes. The difference is tangible.

              1. Jerome 0

                Thanks

                Thanks for the info, that puts things into perspective. I agree that I'd use a Linux VM for this task - certainly if it was something I'd be doing on a regular basis.

                Your 2.5 minutes does assume some familiarity with Linux, VMs and the process in question, however. Your average Windows user (or indeed your average Windows sysadmin) would probably stick to Richcopy.

                1. Trevor_Pott Gold badge

                  @Jerome 0

                  Unless, of course, they had takent hte time to learn to use Linux + Webmin. Webmin = Training wheels for Linux. :D

                  CHeck my previous articles...there were a bunch on Webmin, Usermin, Virtualmin, and Cloudmin. If you are remotely afriad of Linux...you need not be!

  53. Marc 25

    XCopy FTW

    I've done circa 10m files with xcopy on Server 2003...no bother at all.

    I went off and had lunch...

  54. DJB

    RichCopy - Thread number

    Rich copy works really well.

    I have set the file copy count to 1 (no multi threading) in the options.

    The copy still runs really quickly and you get no fragmention.

  55. bofh80
    Dead Vulture

    Total Commander

    Total Commander

    I would have liked to have known how this held up.

    With the synchronization feature it would have taken a while, but i'm sure you'd end up with a perfect working sync, and really good reports for failures etc.

    How can 50 comments miss the one tool that's capable of this that's been running for years ? (1993 - same time as winrar.)

    Do IT people even read this site anymore. Let alone write for it.

  56. Anonymous Coward
    Coat

    I'd use Carbon Copy Cloner

    .... but I'm not using Windows.

    Mine's the one with an iPad in the oversized pocket.

  57. David Pickering
    Thumb Up

    i had a similar problem

    had to extract millions of files from millions of zips over a network - ended up writing a lil .net app to do it - worked beautifully and could produce a report on the problem files too

  58. bluesxman
    Pint

    :-l

    Having piqued my interest, I wrote a quick script to create 60,000,000 zero byte files to play with. Presently numbering 450,000 those pesky little critters are already occupying 1.7GB. I fear I have not the raw capacity (let alone the free space) on my laptop to complete this logical exercise. Now I need another script to delete them. Oh well it killed the last 15 mins before 18:00.

    Hark, is that a pint calling?

  59. CalmHandOnTheTiller
    Pirate

    ARCHIWARE PresSTORE Synchronize

    Wunderbar.

  60. howzer22

    Richcopy download blocked bu IE 9 - Virus!!!!

    Did a search for Richcopy and got taken to the folowing Technet article.

    http://technet.microsoft.com/en-us/magazine/2009.04.utilityspotlight.aspx

    Tried downloading HoffmanUtilitySpotlight2009_04.exe but IE 9 beta blocked it becaue a virus was detected!

    by the way I have used EMCopy from EMC for my file migrations, similar to RoboCopy but works!

    I found a bug in RoboCopy, was quite a specific issue but after speaking with the developer who agreed it would require a major rewrite I went with EMCopy with which I have never had any issues.

  61. Anonymous Coward
    Anonymous Coward

    I want to see a screengrab

    of Windows Explorer making a stab at how long such a copy operation would take.

    Before it cried enough and hung irrevocably, of course.

  62. Anonymous Coward
    Megaphone

    Next time you should put them on ZFS instead...

    Currently we store about 10 billion ~8k files on NFS.

    Moving data is as simple as sending ZFS snapshots.

    The only real limitation is disk/network bandwidth.

  63. Anne-Lise Pasch

    ViceVersa Pro

    We copied more files even than this using ViceVersa. Slow enumerating the files, but the copying worked flawlessly.

  64. Anonymous Coward
    Anonymous Coward

    60M files - 1 directory?

    directory by directory

    If you have 60m files under a single directory then you have management or application problem as this will surely be causing file system performance issues.

    i'd do it via hardware raid doing the grunt work, might need to mod the partion table though.

    my guess it will take a long time to check - like generating and comparing file hashes.

    why do I immediately think this is a government department.....

  65. Michael C

    linux has its limits too

    We've got a server here we're trying to migrate to a SAN. Linux file system. Works fine so long as you query a file directly (by database reference) but if you ls a directory NFS throws all manners of errors.

    I'm new here, trying to help out with a lot of things, and one of them is developing a script to parse the db and move the files and update the db records of their locations on the fly. file system was built when they expected to have a few hundred thousand files. ...there's millions, in flat folders. Huge nightmare. Can't be backed up.

  66. Notas Badoff

    rsync sunk?

    People keep mentioning tar/rsync/etc. without thought, perhaps, that copying files requires preserving the metadata? If you copy all the data and filenames and 'everything', but the ownerships and permissions are wrong at the end, you haven't finished but simply wasted your time.

    Using Samba at least preserves the user-visible metadata.

    1. purplefloyd
      FAIL

      rsync -a

      like cp -a. Preserves metadata. The a is for 'archive'...

      tar preserves metadata by default...

      Whether the Samba client can do a good enough job of preserving permissions when talking to a Windows host is perhaps another matter...

  67. BlueGreen

    Thinks..

    I've no experience with copying this amount of stuff, so just throw in a couple of points: while everyone's nominated rsync, there's a tool called unity which does similar but more flexibly (never used either).

    I do use something called cfv which produces md5 hashes of a directory. I run it on a dir I'm about to back up then again when the copy's finished and diff them. Seems like good practice for you too. cfv's documentation is poor though.

    To the main point, are you sure you've got 6x10^7 files? Are you sure windows is reporting that correctly, as it seems unable to copy them why trust it to report the count accurately? WTF have you got that many for, if they're not indexed in some way how do you ever expect them contents to be discoverable? Is the directory structure also the index?

    Re. your assumptions about fragmentation, IIRC (and I do mean IIRC) NTFS packs smaller files into a disk block to prevent too much wasted space so fragmentation there isn't so much a problem. You assume that some tool copy several files at a time must cause fragmentation, that's an assumption you can't make. IIRC you can reserve disk space up front in windows which you can then fill. Doing so - if that's done by any tools - should prevent this (and would increase write speed significantly of course).

    Also, size. If you are concerned about fragmentation then you must also be assuming that considerable fraction of these files must be greater than a disk block ie 4K default on ntfs. If each file was 4k then it's about 1/4 terabyte - ignoring internal file structures! Let's have it, what total size of files are you moving? What's the background to this event anyway?

    1. BlueGreen

      clarification

      cfv as I've set it up produces a hash per file while traversing the directory recursively. You then diff the (large) log file. Easy. I understand it has another mode but couldn't get it to work. Again, the docs are sucky.

      cfv is here <http://cfv.sourceforge.net/>

      Tool I called unity is actually unison: <http://en.wikipedia.org/wiki/Unison_%28file_synchronizer%29> and <http://www.cis.upenn.edu/~bcpierce/unison/>

      btw, rsync on window+cygwin: could never get that to work.

      I'd really like to know how & why you managed to produce 6E7 files & what their distribution of sizes are. Really, really.

      ta

  68. Cat Sitting

    or maybe...

    disc imaging software?

  69. Trevor_Pott Gold badge

    Re: xcopy, robocopy and rsync+cygwin

    Xcopy won’t see anything with a path depth larger than 255 characters. Otherwise, it would trundle through 60M files just fine, I expect.

    Robocopy theoretically /should/ see files with a path depth longer than 255 characters, but in my experience simply doesn’t. It will see a file with a /name/ larger than 255 characters (or at least that’s what the length of the file name looks like at first glance,) however when I feed it /path depths/ larger than 255 is continually refuses to copy the file. I banged away at it for about half an hour before giving up and moving on to the next tool on my list. It should be noted that I tried only the command line version of robocopy. I did not give the GUI loader much of a go. (I figured if I was going to faff about with GUI tools, Richcopy > Robocopy anyways, so….)

    Cygwin + Rsync was actually the very first thing I tried. <3 rsync. Sadly, rsync didn’t seem to play nice with long path depths either. Wholly apart form that, it blew up somewhere around 6M files for reasons I can’t discern. The *nix version doesn’t seem to have that problem, only when running rsync + cygwin under Windows did I encounter it. The system in question was a Server 2003 R2 instance, fully patched as of August 20, 2010.

    Also of random note: VMWare Server 2.0 has served me well in the past for many things. Need to toss a Linux VM on a system for a few days do perform a task just like this? Works a treat. There is a caveat to that plan however: VMWare Server 2.0 absolutely /abhors/ TCP offloading. They don’t play well. Additionally, it can be the DOS settings:

    http://support.microsoft.com/kb/898468

    If you decided to load it up in order to move files around, you might consider this first. Otherwise your VM won’t talk to the host quite as well as would be required to pull this off.

    1. The BigYin

      Cool

      Shame about the rysnc/cygwin glitch. Might be worth punting a note to Cygnus.

      Still, at least you have a solution.

  70. Anonymous Coward
    Boffin

    "There be dragons there" - links can be fatal to dumb file copiers.

    A lot of file copy tools will probably fail because they don't understand NTFS links (e.g. soft links, hard links and Junctions), so blindly follow them, rather than ignoring them or replicating them on the target disk. If RichCopy or Linux don't provide configurable support for NTFS links, then your target will be a mess of redundant folders and files or missing vital links!

    Beware, Vista and Windows 7 make extensive use of Junctions, in user profiles and common folders; so naive tools like xxcopy, and your favourite two pane file tool will fail!

    Write a file copier in Java 7 (Beta) using the new java.nio.file functionality; it provides quite advanced filesystem specific support for file attributes, timestamps, links, and ACLs, and supports directory cursors; it even provides a FileVistor class to make recursive folder traversal easier, with support for error trapping.

    I also like DirectoryOpus, and note that it can create the various link types, and can see them, however it doesn't appear to provide options not the follow them or replicate them on the target drive.

  71. Anonymous Coward
    Anonymous Coward

    Fanboy with beard and sandals here

    Well, the beard went a few decades ago, but...

    Not in the least surprised that a simple, straightforward -nix tool turned out to do the job. Do we have to be reminded that Unix was a stable, multi-tasking, multi-user OS before Windows was born?

    Trevor, I never did anything of this magnitude, and for all my big words, I have no idea if my instinctive turning to Linux (well, actually, I would have been working with Unix machines anyway, so the question wouldn't have arisen) would have been an instant solution, or taken just as long as several experiments with Windows software.

    It's been far too long, so I can't tell you how I did copy relatively large numbers of files from one machine to another, but I seem to remember some strange combinations of dd, cpio, tar, maybe rcp, and it would be strange not to have a find command in a regular unix maintenance script! Combine with some sprinkling of `command substitution` and | pipelines to taste and enjoy!

    Ahhh... Happy days :)

    1. Trevor_Pott Gold badge
      Happy

      @thad

      I swear, after all the fiddling, cp really required no special treatment. I used a webserver that had CIFS shares from the windows servers mounted locally. I sshed into the Linux VM and typed the following:

      > cp /fs1-root /fs1-new-root/from-fs1-root –R

      All files I needed to move were DFS “subfolders” of what to the webserver appeared to be fs1-root.

      I then merely needed to cut/paste whole folders into their correct “new” positions on the destination server when I was done. Oh, and apply permissions: part of this move was to reset permissions on all files thanks to a domain migration. (That and some fairly lousy permissions management that had crept in over the years preceding the move.)

      No need for dd, cpio, tar or anything else. cp just went hard and finished fine.

      1. Anonymous Coward
        Anonymous Coward

        rsync would probably have been significantly faster

        and you could have run the job multiple times, initial run to to get the data at that point over, and then secondary ones that would go through the file system and only copy new files, or updated blocks from updated files.

      2. Anonymous Coward
        Linux

        cp for windows

        At the risk of sounding like yet another 'you should have tried X' commentard have you come across the GNU coreutils at all?

        http://en.wikipedia.org/wiki/GNU_Core_Utilities

        http://gnuwin32.sourceforge.net/packages/coreutils.htm

        This gives you a bunch of useful 'nix functions including cp. It'd be interesting to see if the win32 binaries perform as well as the real deal in a vm. They have done so for me so far but my uses are pretty noddy in comparison to yours.

        1. Trevor_Pott Gold badge

          @AC

          I will try them!

  72. Anonymous Coward
    Anonymous Coward

    I've had to do similar...

    I've had to migrate a datastore of some number of tens of millions of 2KBish tga files, what I did was took a snapshot (EMC DMX timefinder mirror) and mounted it on the new server. Now you do need a shedload of money for this, but it is quick!

    I've also done it between DMX arrays using SRDF, even over long distance IP links, although this is slower...

  73. Terry Walker
    Boffin

    Another Tool ...

    This is an older tool but still a goodie ..

    Karens Replicator

    Btw, there is a way you can bypass the 256 character limit when transfering files:

    \\?\<driveletter>:\<path> for local files or network drives

    \\?\UNC\<server>\<share> for UNC paths (though i've never gotten this to work)

    Here's an article in MSDN about this: http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

  74. SilverWave
    Go

    rsync: Andrew Tridgell, Paul Mackerras. Yeah pick that one.

    rsync also does a check that the copied file is identical, nice :-)

    hmm would be interested in seeing how the windows ports are rated... but using a Linux server at least you know you are getting the real deal, heavily tested product.

  75. Daniel B.
    Boffin

    SFU tools?

    Most people over here were recommending cp over Cygwin; while Cygwin is a nice tool for home PCs, I have never understood why everyone flocks to Cygwin when MS has actually implemented a POSIX-compliant subsystem on Windows with better perfomance than Cygwin. That said:

    I would recommend doing the cp trick with Services For Unix installed. If both servers have SFU, you can actually enable NFS shares on the destination, nfsmount on the source, and copy all the 60m files using cp, all the stuff will go through NFS instead of CIFS. I haven't checked the permissions, but this method will probably preserve the file ACLs as well!

  76. powers1

    HOW DO YOU COPY 60M FILES? USE A SINGLE-THREADED LINEAR COPY

    > You could restrict Richcopy to a single thread,

    > but then it is no faster than cp.

    The added benefit of Richcopy is not having to install a whole new operating system just to run a single program.

  77. drfreak

    Interesting

    I wonder if part of the issue is that most apps will need to get the full directory listing before copying. With millions of files, that can eat up some RAM before copying even starts.

    This makes me want to experiment with .NET 4's new System.IO.Directory.EnumeateFiles() method. The new function does not get a full directory listing, rather it gets the *next* listing as you iterate over the collection it returns.

    Using Parallel.ForEach to iterate over the collection makes for an easily coded multi-threaded file management app I would venture to guess...

  78. Galidron

    PowerShell

    I haven't looked at it closely, but I wonder if PowerShell might have had some utilities that could do it.

  79. Lusty
    Gates Halo

    FSMT

    No mention of the Microsoft File Server Migration Toolkit which is designed specifically to do this kind of migration. The tool also handles a lot of the other problems of changing file servers too so I would suggest giving it a go next time, or contact a consultant with some experience of MS toolsets :)

  80. Anonymous Coward
    Linux

    cp? dear lord no!

    cp works, yes, but its not the nest way of dealing with the method. the old tried and trusted method is to use tar

    eg basic example

    tar cf - * | ( cd /target; tar xfp -)

    you could use eg find or a 'for i in' bash script to ensure you only dealt with certain file types or names too. I really like this as it means basic/duff/temp files can all be omitted from backups etc

    1. Trevor_Pott Gold badge

      FSMT

      FSMT is fantastic...if the layout of your server on the new domain will be identical to how it was on the old one. In this case it wasn't. Users were completely different in naming scheme, groups were radically different, and the organisational hierarchy of the files was changed. What needed to happen was to get the files from A to B. After that, security would be changed, and files reorganised. So for this particular case, FSMT wasn't any more useful, (and was a bit slower than) other options.

  81. two00lbwaster
    WTF?

    No hierarchical folder structure?

    Surely you would do this in parts using something like the native zip functionality or a third party program like WinRAR to turn a large number of these files into a single archive.

    The thought of transferring 60m files across a network connection makes quail. Even the web servers that I look after top out at 7.5m files.

  82. Werner Donné

    Replace NTFS with ZFS

    With ZFS you would create a snapshot on server1 and then write:

    > zfs send snapshot_name | ssh -l username server2 zfs receive filesystem_name

    You would have an exact copy of the complete file system without needing temporary storage.

  83. Guus Leeuw
    FAIL

    Shall I move or copy the title

    All the while very great Trevor that you're so proud of you're little findings, but can you make an article that's clear upon the very goal you wanted to achieve?

    Is it copy or move or did you use cp to move or mv to copy?

    The icon says it all.

  84. Danny 14 Silver badge
    FAIL

    server copy fail

    Wow, I cant believe you were migrating a live server and you weer "copying" files?

    DFS the drive, add the second server as a replication destination, then sync between the two servers. 1 - its multithreaded, 2 - it syncs therefore changes are mapped right until you kill the server. 3 -its free.

    Even good old robocopy will need to be run multiple times to ensure you havent killed anything. Fook knows what you would have done if you would have needed a restore inbetween.

    1. Trevor_Pott Gold badge

      @Danny 14

      FIle servers in different domains. On different forests. DFSR no worky in that situation.

  85. Anonymous Coward
    Gates Halo

    bill gates is a genius

    Bill Gates is the richest man in the world. And best of all, he achieve this producing an operating system which people inexplicably buy even though it cannot do the everyday task of simply copying 60 million files from one place to another.

    And everyone who chooses his software must be an idiot because if they ever need to copy 60 million files, they're screwed.

  86. Anonymous Coward
    FAIL

    Windows and Linux - compare and contrast

    The finest brains on the planet wrote the Windows Copy Engine (the slothful malevolence behind CopyFileEx) and its brilliance was explained at some length here - http://blogs.technet.com/b/markrussinovich/archive/2008/02/04/2826167.aspx

    But unfortunately, a few hundred commentators didn't seem to agree. Over the course of a year or so, they hatefully ignored the genius theory and concentrated on the woeful real world performance instead.

    The GNU/Linux coders - clearly neophytes - didn't bother with a Copy Engine, and just coded cp as a open file, followed by a chunked read/write loop, followed by a close file, letting the kernel make sense of it all.

    And that's what worked for 60 million files. Priceless.

  87. Henry Wertz 1 Gold badge

    rsync FTW

    @Notas Badoff, rsync does preserve permissions.

    @bluesxman, can't you just "rm -R (whatever directory the 60,000,000 files are in)"?

    So, older rsync versions did the whole traversing the entire tree, then copying, but newer ones are incremental, which I'm sure would be required for this many files. I love rsync, although the tar solutions would certainly work as well, that is what I used to use.

  88. TkH11

    Windows Copy

    They could at least fix the Windows explorer so it actually didn't stop Windows from multi-tasking when it copies. That'd be a start.

    Try copying a 30GB file in Windows and then try to continue working...

    Not a chance. Always struck me as odd, that in 20 years Microsoft never got this to work.

    I don't use Win 7 so I don't know if this pain-in-the-arse has been fixed.

  89. ididnttry

    I did not try 60m files copy on a wins server....

    but i would think if you setup samba in a rush, you are probably ignoring file permission and ownership. and copy it under "administrator" rights.

    transversing the 60m by cp -pR is still a hard task. if I have the time, i'd probably let reorganize the files into directories first and do that copy in several (or tens) of runs. because I dont think all 60m files are in used, they are surely static (that's why you can start copying!).

  90. This post has been deleted by its author

  91. Simon2
    Thumb Up

    What about Norton Ghost?

    i've used Norton Ghost before to perform disk backups/transfers, not with 60m files but i can't see why it shouldn't work.

    Regularly use it to make my own automated recovery disks by saving the partiton* to an image file, which i then burn to DVD/CD(s) along with a batch script to boot from the DVD/CD.

    *i use nLite/vLite to remove crap/add drivers/customise the installation before installing the OS.

    Then, once it's booted up, i add essential software (codecs, AV, WinRar/WinZip, Firefox, etc...) before saving everything to the image file.

  92. 88mm a.k.a. Minister for Misbehaviour
    Grenade

    One letter... twice

    dd

  93. foulmouth

    netcat, does everything!

    I don't know if anyone's mentioned it but...

    pipe tar -c into nc then have nc listening on the other end, and piping into tar -x.

    You can also add things if you like such as piping through a compression program, or encryption program. I've personally found this to be the fastest way to get things transferred.

    That's not going to suit everyone's needs, it has its downfalls, but is useful to know.

    What you need to keep in mind, no matter what you use, 60M files are probably going to take a long time to transfer, assuming they are large.

    In such a case, you may want to think about something such as using an external drive, ensuring you have gigabit+ LAN, etc.

  94. mainframe
    Happy

    only 60m?

    on the mainframe this is not an issue at all.... Only 60m? Peanuts!

  95. Paul 191
    Thumb Up

    Rsync

    Rsync!

  96. David McMahon
    Pint

    Backup over WAN?

    That worries me!

    60M files in the cloud!

    Do any of these tools check (verify) the files afterwards?

    I urge a local backup copy, perhaps a PCI-X / PCI-e Raid card and some HDD's??

    Very interesting article, looking forward to testing those tools, I copy files from dead Windows installations (Windows installs are always slowly dying!) but yeah not on that scale!

    I will need to buy some more HDD's this/early next year to back-up customers data, WinXP can only use a single Volume of 2TB, so a Raid 1 array of 2x2TB drives will suit :)

    I might buy Windows 8 lol

    and relax!

    1. Trevor_Pott Gold badge

      @David McMahon

      Backup over WAN shouldn't scare you at all! Let me give you a brief rundown:

      At our central (physical) location we have two (logical) sites. (In total there are four physical and five logical sites.) One is the production/manufacturing setup for this physical location. The other is "head office." Each logical site has a pair of DFSR-twinned (http://www.theregister.co.uk/2010/09/27/sysadmin_dfsr_clustering/) file servers. This physical location also has a dedicated "backup server" that stands apart from the rest.

      We don't bother backing up the manufacturing site-specific files, as they are only really relevant for a brief period (measured in days.) In addition, if the physical location were to burn down, those files cannot be relevant to us as the manufacturing apparatus to use them no longer exists.

      Now, the entirety of the “head office” logical site’s files are backed up to the backup server every night. Selected files from the manufacturing sites’ networks (such as databases and configuration information) are also trucked over to the backup server. It munched and crunched and vomits out some highly-compressed fileage. These files are then plopped onto a share on the Head Office logical site’s file servers. This share is DFSR-replicated to the twinned file servers in all the other logical sites.

      This ensures that the backup files containing all the critical company data are replicated “offsite” over the WAN. (Site-to-site VPN links and uncapped internets are <3.) It’s simple, doesn’t involve the “cloud” and works like a hot damn.

  97. pan2008
    WTF?

    what?

    Journalism better than the Daily Star. So why don't you use richcopy single threaded so it doesn't fragment, and as you said that's about the same performance as with linux? Problem solved and you don't need linux.

    1. Trevor_Pott Gold badge

      @pan2008

      Actually, I didn't say it's about the same performance as Linux. It's slower in single threaded mode than the Linux copy. Unless you consider 20%ish "about".

  98. vMAN
    Go

    another option

    http://www.theregister.co.uk/Design/graphics/icons/comment/go_32.png

    Missing details on connectivity with between two windows servers and if there is shared or replicated storage (-vs- DAS in each systems).

    I've successfully moved on several different occasions for several customers over 60+ million files including one system with 5000 roaming windows profiles (citrix environment) with 100,000 file change count per day using DoubleTake. (Competing products have choked doing this) This product uses a windows filter driver which is certified by Microsoft and is able to mirror and replicate simultaneously locally or across slower links (a.g. WAN) and optionally use compression. This product does byte level replication (not block or file level as many other solutions do). It is also a very fast solution. Only recommendation for ease of use, speed and verification is to break down the number of files into smaller quantities and replicate those.

This topic is closed for new posts.

Biting the hand that feeds IT © 1998–2019