Head Guru Posted December 8, 2005 Posted December 8, 2005 Operation Speed Up! I wanted to take a moment to inform our clientele on a large upcoming project. As some of you have noted, the local server backups have been placing higher than normal loads on our server farms. This has been caused by a number of reasons. They range from growing user accounts, larger statistical files and other small increases in file storage all have led to higher loads during backups. You may ask yourself why backups place our servers under higher than normal loads. The answer is really quite simple. The largest contributing factor to higher loads is the Compression of files. A simple characterization of data compression is that it involves transforming a string of characters in some representation (such as ASCII) into a new string (of bits, for example) which contains the same information but whose length is as small as possible. When the amount of data to be transmitted is reduced, the effect is that of increasing the capacity of the communication channel. Similarly, compressing a file to half of its original size is equivalent to doubling the capacity of the storage medium. It may then become feasible to store the data at a higher, thus faster, level of the storage hierarchy and reduce the load on the input/output channels of the computer system. Sounds good right? In theory it is. The biggest issue we have found is actually getting that data into a compressed format. It consumes large amount of CPU Cycle time and utilizes a lot or Memory. Typically, both using CPU cycles and Memory equates to higher loads on the servers. TCH has been utilizing the standard cPanel backup utility for our local backups. This is a well written script that has, for the most part, done a good job for us. However, it does use compression of files and this GZIP of the user accounts is what is causing the loads on the servers. We have heard the complaints of our users and have been busy behind the scenes working on plans to eliminate the loads caused by local backups. I am pleased to announce that over the next 45 days we will start to transition all our servers to a different form of backups. We will start using Incremental Backup’s on our local side. This is a form of back up that uses NO compression and is much less likely to cause the loads we have been seeing on the servers. Incremental backup provides a much faster method of backing up data than repeatedly running full backups. During an incremental backup only the files that have changed since the most recent backup are included. That is where it gets its name: each backup is an increment since the most recent backup. We have been hard at work making preparations for this event and we will actually being a transition of 10 servers starting this weekend. The following servers will be transitioned to the new form of backups this weekend. server100 server114 server11 server20 server63 server75 server84 server89 server92 uk1 The first run may be a tad load intensive as we get the total backup done, but it should be smooth sailing from here on out. Also, there will be some server's that will require larger backup drives to be installed as the current backup drives are not large enough to conduct this form of backups. I am willing to answer any questions you may have, simply ask in this post. Thank you.
borfast Posted December 8, 2005 Posted December 8, 2005 (edited) Once again, a nice move to keep TCH's quality on the top. Thank you, Bill As for questions, I do have one question, more out of technical curiosity than anything else, but at the same time may also answer the same question others may have: In the incremental backup, if I change a file and all other files remain unchanged, only that file will be added to the next backup run - what about if I delete a file, does it get deleted from the backup, too? I'm asking this because my notion of incremental backup is that there is one single copy of the backed up files and that copy gets changed to reflect changes in the original set of files but I'm not sure how it works if I want to get something from the backup that has been deleted from the original set of files. I think the greatest backup system would be to have something like CVS or SVN, in which every backup run adds a "version" to the backup set. There is still one single copy of the backed up files, which corresponds to the latest backed up files but for every version, there is a change log (a file that has every change that has been made on that version) and if you want to retrieve files from a previous backup run (from a previous version), the system is capable of reversing all the changes back to the version you want, based on that change log. It would be pretty cool but I don't even know if something like this exists. By the way (really just out of curiosity), what are you using for the backups, rsync? Edited December 8, 2005 by borfast
Madmanmcp Posted December 8, 2005 Posted December 8, 2005 Hi Raul, my understanding is backups are treated as separate packages and stored separately no matter the media used for storage. So when you do an incremental backup yesterday it is stored away whole, when you do another today it is stored away whole with what ever has changed from the previous backups. The file that was deleted isn't backed up because it didn't change...it remains on the last incremental package. Now, if you made a change to that file AFTER it was saved and deleted the file before it was resaved on another incremental backup, you will not get those changes if you restore it...they were not saved. Hope this answers your question...and I'm sure someone will come along and correct me if I'm wrong
GroovyFish Posted December 8, 2005 Posted December 8, 2005 (edited) Very excellent news Bill! This, combined with the announcement you will be upgrading our server just blows me away! (I'm on server 63) I am curious about the time you will be running the first backup? You mention that the first run will be load intensive. Will this slow our sites down significantly? How long is the first backup expected to take? The reason I ask is because we have an online store, and this is our busy season, of course. Since we serve US customers only, our site is pretty quiet overnight, so I was hoping that would be the time you plan on running the first backup .. Thanks again! Edited December 8, 2005 by GroovyFish
TCH-Vivek Posted December 8, 2005 Posted December 8, 2005 Just a follow up to the wonderful description of the new Backup system from our Head Guru. The weekly backup system will be switched to uncompressed, incremental backup from the conventional compressed one. As Head Guru has already explained, a major share of the load caused during backup runs is caused by the compression of data. Ofcourse, it takes only half the disk space an incremental backup would take. But, the high load during the data compression often causes the backup process to fail and we struggle to make it up to date. Incremental backup will need more disk space, almost double the space needed by compressed backup. The first time when it runs high loads can be expected since it takes a full backup, but not so high as in previous case since there's no compression. From the very next run, only new accounts and additional bytes in existing accounts will be trasferred. That means, if you have modified a file after the last incremental backup, it will sync only the changes you made, next time when it runs. Any files that were deleted after the last backup will be removed from the backup folders. The basic construct used for the transfer is rsync, which is well known for its efficiency and flexibility. We will introduce it on the 10 servers given above. Depending on the feedbacks, we will go ahead and implement it on other servers. P'ps, CVS style of backup is the perfect backup system. But, it is not feasible at this stage on our shared servers. But, once if we make our backups the most reliable and up to date than it ever been, I guess Bill will think what's next..
borfast Posted December 9, 2005 Posted December 9, 2005 Thanks for the update, Vivek. That's what I thought. It's my basic concept of "incremental backup": using rsync and keeping a single copy that becomes exactly the same as the original set of files everytime the backup process is run, meaning that if you add a file, it gets added in the next run and if you delete a file, it will also be deleted in the next run. As for the CVS style backup, as I said, I don't even know if something like that exists. Does it? *Goes to the Google search text box and types 'CVS style backup'... Well, it found a few programs that do something like that but they're all for Windows... let me try again... Bingo! http://www.nongnu.org/rdiff-backup/ rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. *Goes away to play with this and try to set up a backup system that uses it
TCH-Vivek Posted December 10, 2005 Posted December 10, 2005 Yes borfast, it is possible in Linux also... and we don't need any 3rd party software if in case we would switch to it. With almost double the space used by an incremental backup 3 or 4 different snaps can be kept. Who knows there won't be another great announcement from Bill.
borfast Posted December 10, 2005 Posted December 10, 2005 I have set up rdiff-backup on my home desktop and it's working great! Thanks, Vivek!
Deverill Posted December 10, 2005 Posted December 10, 2005 Sorry to seem dense, but want to be sure I understand it. Considering the discussion of deleted files, this form of backup will allow you to replace our entire site if it blows up but if we delete a file and want it back 7-14-50 days later then we are out of luck? Just asking because it may affect how we do backups... being a little more coreful of that delete button. In all seriousness, if this is how it works then we'll have to actually keep our own backups if we want historical information... not a bad thing but just something to know. By the way, you guys rock!
TCH-Don Posted December 11, 2005 Posted December 11, 2005 You mean everybody does not have local backups archived? its so easy to zip everything up before big changes.
borfast Posted December 11, 2005 Posted December 11, 2005 Jim, yes, that's what it means. This type of backup basically tries to keep up with every change you make in your website, it tries to be a constant mirror of your website, meaning that if you delete a file from your website, you should not expect to have it on backup several weeks later.
TCH-Vivek Posted December 16, 2005 Posted December 16, 2005 An update: (sorry for the delay) Incremental backup will run on the following servers in this week end. # Stage I (10 servers) # server100 server114 server11 server20 server63 server75 server84 server89 server92 uk1 # # Stage II (16 servers) # server6 server12 server19 server21 server32 server39 server45 server46 server50 server59 server101 server104 server109 server110 server111 server112 This will be the second run on the servers in Stage 1. We are all excited about this and we will be watching very closely on the success of this first stage of servers. Will update you all after the first day of weekly backups (SATURDAY).
TCH-Vivek Posted December 24, 2005 Posted December 24, 2005 Incremental backup has succefully finished on all the servers in Stage I and Stage II. So, we have added more servers into the list. # # Stage III (37) # server102 server103 server105 server106 server107 server108 server14 server15 server16 server44 server49 server51 server54 server56 server60 server65 server66 server67 server69 server70 server73 server76 server77 server78 server8 server80 server81 server382 server83 server385 server386 server387 server90 server94 server97 server98 server99 More updates forthcoming.
TCH-Vivek Posted January 6, 2006 Posted January 6, 2006 Added more servers for this week. # # Stage IV (10) # server115 server116 server35 server38 server47 server5 server52 server71 server74 server93
TCH-Bruce Posted April 16, 2006 Posted April 16, 2006 This thread is over 4 months old and all the servers have had their backup routines changed to incremental backups. Closing thread.
Recommended Posts