Here is a memo on backing up MediaWiki instances, say deployed as a part of a Web site mywebsite.com.

Here is a listing of concrete steps:

Get inside the backup root directory on local file system:

cd /Volumes/BACKUP/mywebsite.com

Backup using the `backup_mediawiki.sh` backup script

Login web server
Update VCS repository https://github.com/lumeng/MediaWiki_Backup

Back up using MediaWiki_Backup/backup_mediawiki.sh

 # assuming web directory is ~/mywebsite.com/wiki
 WIKI_PATH="mywebsite.com/wiki"
 # assuming the path to save a subdirectory backup_YYYYMMDD created by backup is path/to/backup/mywebsite.com/wiki
 WIKI_BACKUP_PATH="path/to/backup/mywebsite.com/wiki"
 # get to the home path before start
 cd
 # Start backup. This will create backup path/to/backup/mywebsite.com/wiki/backup_YYYYMMDD.
 path/to/backup_mediawiki.sh -d $WIKI_BACKUP_PATH -w $WIKI_PATH

Rsync the backup to a local hard drive:

cd /Volumes/BACKUP/mywebsite.com

Backup the whole web site user's home directory that includes the backup files created above, using rsync




rsync --exclude-from rsync_backup_exclusion.txt -thrivpbl user@webhost.com:/home/websiteuser rsync_backup/

Ideally, upload the backup to cloud storage such as Drobpox.

HTML backup using `wget` for immediate read

Optionally, one can also keep a crawled version of a MediaWiki instances. Sometimes, it can be useful to have a copy of HTML files for immediate read offline.

cd /Volumes/BACKUP/mywebsite.com/wget_backup
mkdir mywebsite.com-wiki__wget_backup_YYYYMMDD
cd mywebsite.com-wiki__wget_backup_YYYYMMDD
# crawl the whole Web site
# wget -k -p -r -E http://www.mywebsite.com/
# crawl the pages of the MediaWiki instance excluding the Help and Special pages
wget -k -p -r --user-agent='Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36' -R '*Special*' -R '*Help*' -E http://www.mywebsite.com/wiki/
cd ..
7z -a -mx=9 mywebsite.com-wiki__wget_backup_YYYYMMDD.7z wget_backup_YYYYMMDD

Remarks:

-k: convert links to suit local viewing
-p: download page requisites/dependencies
-r: download recursively
--user-agent: set "fake" user agent for the purpose of emulating regular browsing as sometimes site checks user agent. Check user agent string at useragentstring.com.

As for time cost to create the wget-crawled backup, for reference, it took about 30 min to download a small MediaWiki installation with hundreds of user-created pages in an experiment I did.

If there is a small set of pages that you need to backup, curl may be alternatively used, for example,

# download multiple pages
curl -O http://mywebsite.com/wiki/Foo_Bar[01-10]

References

https://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki
https://www.mediawiki.org/wiki/Fullsitebackup
https://www.mediawiki.org/wiki/Manual:DumpBackup.php
https://wikitech.wikimedia.org/wiki/Category:Dumps

Backup using the backup_mediawiki.sh backup script

Backup the whole web site user's home directory that includes the backup files created above, using rsync

HTML backup using wget for immediate read

References

Backup using the `backup_mediawiki.sh` backup script

HTML backup using `wget` for immediate read