Copying or merging wiki user accounts
There are several use cases where user accounts need to be copied from one wiki to another, e.g.
- Merging two wikifarms
- Wikifarms store user accounts in the metawiki database, which is shared by all wikis in the wikifarm. If the wikis of a wikifarm (or a selection thereof) need to be moved to another wikifarm, then a merge of the user accounts becomes necessary.
- Moving a wikifarm to anothers server
- Strictly speaking, moving a wikifarm to another server can be done by copying the whole databse. In practice however, the new wikifarm will often be a separate branch of the original wiki, so a merge becomes necessary.
The process consists of a sequence of activities, described in more detail below:
- Backup the user database
- Resolve conflicts
- Merge the user accounts
- Anonymize user accounts which should be discarded
- The source database server shall be refered to as "FROM"
- The sink database server shall be refered to as "TO"
Make a backup of the tables in TO that will be modified:
- user table in metawiki database
- user tables user_groups, user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist in all individual wiki databases.
2. Resolve conflicts
User accounts can conflict if FROM and TO have accounts with the same ids but with different data. This can happen if:
- accounts in FROM have been updated after TO was copied.
- accounts have been created in TO, but not in FROM.
Resolving conflicts involves the following actions:
- Search for incomplete accounts in FROM (e.g. without an e-mail address). If incomplete accounts are found, stop.
- List all conflicting accounts in TO, i.e. which have no equivalent in FROM because they were created after TO was copied.
- Add a buffer of e.g. 2000 accounts in TO, to accomodate future account merges.
- Move conflicting accounts after the buffer.
- Update user ids in the individual wiki databases, in the tables user_groups, user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist.
- Check that conflicts are resolved, else stop.
Merge the user accounts in TO with accounts in FROM.
- Copy the table user_groups from all individual wiki databases that will be merged.
- Update user data in the tables user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist from all individual wiki databases that will be merged.
- Copy accounts which exist in FROM but not in TO, or which were updated in FROM after TO was copied.
In some cases, not all accounts shall be merged. However, to keep the page revisions, discussions etc., user accounts cannot simply be deleted, so they have to be anonymized:
- Make a list of users which will be kept, e.g.
- Users which belong to a specific user group
- Users which are registered with a specific wiki in TO
- Users with certain E-mail addresses
- Administrators, bureaucrats
- Anonymize all users which are not in the list obtained in the previous step:
- Erase E-mail address and related columns
- Replace login with some string, e.g. id
- Replace password and related columns by empty string or null
- Replace real name by id
Merging wikifarms can be complicated, so it will generally be an iterative process. A script can automatize the most tedious tasks, and provide more security by allowing to (unit) test the merge on a test server.
There are several ways of implementing this:
- Using the Mediawiki http API. Using PHP, the Snoopy library for simulating a browser can come in handy.
- Directly modifying the wiki database, e.g. using the PDO data abstraction layer for PHP. This practice is not recommended by Mediawiki, but in this case, given the potentially large number of modifications necessary, it can be the most efficient solution.
- Using the database abstraction layer provided by Mediawiki.
A script was written for automatically merging the wikis of Museum für Naturkunde Berlin (MfN). The script implements the steps listed above in 4 classes:
- Create a backup of the metawiki.user table on the TO DB.
- Detect incomplete user accounts on FROM database, i.e. without login, password, name or e-mail; Check for conflicting user accounts on TO-database; Add a number of dummy accounts; Try to resolve any conflicting accounts by moving the accounts at the end of the account list.
- Copy tables user_groups in the FROM databases to the TO databases; Copy tables user_newtalk, user_openid, user_properties, recentchanges, revision, watchlist FROM->TO; Find users which have been created in the FROM database after the TO database was copied; Copy an array of users FROM->TO;
- Lists all accounts on the TO database, which will NOT be anonymized; Anonymizes data from all users which are not to be kept.
The interaction between these classes is modelled in the following UML activity diagramm: