Jump to content

Operation Speed Up!


Head Guru

Recommended Posts

Operation Speed Up!

 

 

I wanted to take a moment to inform our clientele on a large upcoming project. As some of you have noted, the local server backups have been placing higher than normal loads on our server farms. This has been caused by a number of reasons. They range from growing user accounts, larger statistical files and other small increases in file storage all have led to higher loads during backups.

 

You may ask yourself why backups place our servers under higher than normal loads. The answer is really quite simple. The largest contributing factor to higher loads is the Compression of files. A simple characterization of data compression is that it involves transforming a string of characters in some representation (such as ASCII) into a new string (of bits, for example) which contains the same information but whose length is as small as possible. When the amount of data to be transmitted is reduced, the effect is that of increasing the capacity of the communication channel. Similarly, compressing a file to half of its original size is equivalent to doubling the capacity of the storage medium. It may then become feasible to store the data at a higher, thus faster, level of the storage hierarchy and reduce the load on the input/output channels of the computer system. Sounds good right? In theory it is. The biggest issue we have found is actually getting that data into a compressed format. It consumes large amount of CPU Cycle time and utilizes a lot or Memory. Typically, both using CPU cycles and Memory equates to higher loads on the servers.

 

TCH has been utilizing the standard cPanel backup utility for our local backups. This is a well written script that has, for the most part, done a good job for us. However, it does use compression of files and this GZIP of the user accounts is what is causing the loads on the servers. We have heard the complaints of our users and have been busy behind the scenes working on plans to eliminate the loads caused by local backups.

 

I am pleased to announce that over the next 45 days we will start to transition all our servers to a different form of backups. We will start using Incremental Backup’s on our local side. This is a form of back up that uses NO compression and is much less likely to cause the loads we have been seeing on the servers. Incremental backup provides a much faster method of backing up data than repeatedly running full backups. During an incremental backup only the files that have changed since the most recent backup are included. That is where it gets its name: each backup is an increment since the most recent backup.

 

We have been hard at work making preparations for this event and we will actually being a transition of 10 servers starting this weekend.

 

The following servers will be transitioned to the new form of backups this weekend.

 

server100

server114

server11

server20

server63

server75

server84

server89

server92

uk1

 

The first run may be a tad load intensive as we get the total backup done, but it should be smooth sailing from here on out.

 

Also, there will be some server's that will require larger backup drives to be installed as the current backup drives are not large enough to conduct this form of backups.

 

I am willing to answer any questions you may have, simply ask in this post.

 

Thank you.

Link to comment
Share on other sites

Once again, a nice move to keep TCH's quality on the top. Thank you, Bill <_<

 

As for questions, I do have one question, more out of technical curiosity than anything else, but at the same time may also answer the same question others may have:

 

In the incremental backup, if I change a file and all other files remain unchanged, only that file will be added to the next backup run - what about if I delete a file, does it get deleted from the backup, too?

 

I'm asking this because my notion of incremental backup is that there is one single copy of the backed up files and that copy gets changed to reflect changes in the original set of files but I'm not sure how it works if I want to get something from the backup that has been deleted from the original set of files.

 

I think the greatest backup system would be to have something like CVS or SVN, in which every backup run adds a "version" to the backup set. There is still one single copy of the backed up files, which corresponds to the latest backed up files but for every version, there is a change log (a file that has every change that has been made on that version) and if you want to retrieve files from a previous backup run (from a previous version), the system is capable of reversing all the changes back to the version you want, based on that change log. It would be pretty cool but I don't even know if something like this exists.

 

By the way (really just out of curiosity), what are you using for the backups, rsync?

Edited by borfast
Link to comment
Share on other sites

Hi Raul, my understanding is backups are treated as separate packages and stored separately no matter the media used for storage. So when you do an incremental backup yesterday it is stored away whole, when you do another today it is stored away whole with what ever has changed from the previous backups. The file that was deleted isn't backed up because it didn't change...it remains on the last incremental package.

 

Now, if you made a change to that file AFTER it was saved and deleted the file before it was resaved on another incremental backup, you will not get those changes if you restore it...they were not saved.

 

Hope this answers your question...and I'm sure someone will come along and correct me if I'm wrong :)

Link to comment
Share on other sites

Very excellent news Bill! This, combined with the announcement you will be upgrading our server just blows me away! (I'm on server 63)

 

I am curious about the time you will be running the first backup? You mention that the first run will be load intensive. Will this slow our sites down significantly? How long is the first backup expected to take?

 

The reason I ask is because we have an online store, and this is our busy season, of course. Since we serve US customers only, our site is pretty quiet overnight, so I was hoping that would be the time you plan on running the first backup :dance:..

 

Thanks again! :)

Edited by GroovyFish
Link to comment
Share on other sites

Just a follow up to the wonderful description of the new Backup system from our Head Guru.

 

The weekly backup system will be switched to uncompressed, incremental backup from the conventional compressed one. As Head Guru has already explained, a major share of the load caused during backup runs is caused by the compression of data. Ofcourse, it takes only half the disk space an incremental backup would take. But, the high load during the data compression often causes the backup process to fail and we struggle to make it up to date. Incremental backup will need more disk space, almost double the space needed by compressed backup. The first time when it runs high loads can be expected since it takes a full backup, but not so high as in previous case since there's no compression. From the very next run, only new accounts and additional bytes in existing accounts will be trasferred. That means, if you have modified a file after the last incremental backup, it will sync only the changes you made, next time when it runs. Any files that were deleted after the last backup will be removed from the backup folders. The basic construct used for the transfer is rsync, which is well known for its efficiency and flexibility. We will introduce it on the 10 servers given above. Depending on the feedbacks, we will go ahead and implement it on other servers.

 

P'ps, CVS style of backup is the perfect backup system. But, it is not feasible at this stage on our shared servers. But, once if we make our backups the most reliable and up to date than it ever been, I guess Bill will think what's next.. :dance:

 

:)

Link to comment
Share on other sites

Thanks for the update, Vivek. :wallbash:

 

That's what I thought. It's my basic concept of "incremental backup": using rsync and keeping a single copy that becomes exactly the same as the original set of files everytime the backup process is run, meaning that if you add a file, it gets added in the next run and if you delete a file, it will also be deleted in the next run.

 

As for the CVS style backup, as I said, I don't even know if something like that exists. Does it?

 

*Goes to the Google search text box and types 'CVS style backup'...

 

Well, it found a few programs that do something like that but they're all for Windows... let me try again...

 

Bingo! :whip:

http://www.nongnu.org/rdiff-backup/

rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago.

 

*Goes away to play with this and try to set up a backup system that uses it :D

Link to comment
Share on other sites

Yes borfast, it is possible in Linux also... and we don't need any 3rd party software if in case we would switch to it. :) With almost double the space used by an incremental backup 3 or 4 different snaps can be kept. Who knows there won't be another great announcement from Bill. ;)

Link to comment
Share on other sites

Sorry to seem dense, but want to be sure I understand it.

 

Considering the discussion of deleted files, this form of backup will allow you to replace our entire site if it blows up but if we delete a file and want it back 7-14-50 days later then we are out of luck? Just asking because it may affect how we do backups... being a little more coreful of that delete button. ;)

 

In all seriousness, if this is how it works then we'll have to actually keep our own backups if we want historical information... not a bad thing but just something to know.

 

By the way, you guys rock!

Link to comment
Share on other sites

Jim, yes, that's what it means. This type of backup basically tries to keep up with every change you make in your website, it tries to be a constant mirror of your website, meaning that if you delete a file from your website, you should not expect to have it on backup several weeks later.

Link to comment
Share on other sites

An update: (sorry for the delay)

 

Incremental backup will run on the following servers in this week end.

 

# Stage I (10 servers)

#

server100

server114

server11

server20

server63

server75

server84

server89

server92

uk1

#

# Stage II (16 servers)

#

server6

server12

server19

server21

server32

server39

server45

server46

server50

server59

server101

server104

server109

server110

server111

server112

 

This will be the second run on the servers in Stage 1. We are all excited about this and we will be watching very closely on the success of this first stage of servers.

 

Will update you all after the first day of weekly backups (SATURDAY).

Link to comment
Share on other sites

Incremental backup has succefully finished on all the servers in Stage I and Stage II. So, we have added more servers into the list.

 

#

# Stage III (37)

#

server102

server103

server105

server106

server107

server108

server14

server15

server16

server44

server49

server51

server54

server56

server60

server65

server66

server67

server69

server70

server73

server76

server77

server78

server8

server80

server81

server382

server83

server385

server386

server387

server90

server94

server97

server98

server99

 

More updates forthcoming.

Link to comment
Share on other sites

  • 2 weeks later...
  • 3 months later...
Guest
This topic is now closed to further replies.
×
×
  • Create New...