Friday, April 10, 2009

Migrating Access database to SQL Server 2005

Problem Statement:

Migrating data from one database system to another is a complex task which involves performing a lot of intermediate database clean up steps. This document covers migrating from Micrsoft Access to SQL Server and gives a check list of steps and the order in which they should be performed. There is an Access upgrade wizard available from Microsoft for migrating data to SQL Server but it is error prone and some times it might crash and it is not do a good job. Try using the upgrade wizard before going through this migration effort though because it will help identify issues so it is a good tool for doing some preliminary analysis but the real migration process can be much more complex and involved depending on the database complexity.

The things that we learn in this post can be applied to database cleanup efforts also so if you are looking to fine tune your database and make it more efficient then this check list will prove very helpful. The check list is quite generic and applies to a lot of databases out there like Sybase, Oracle, SQL Server, Access etc.

Follow the steps below to document your exisiting system and to prepare a check list of things that need improvement.

1) Create a data dictionary: One of the first things I recommend is setting up a data dictionary for your database so you can identify which sections of your database need improvement. One of the easiest ways to create a data dictionary is to write queries to retrieve information from the system tables of the database itself.

Both Access and SQL Server have system tables that you can use to extract valuable information about your database objects and save this information into an Excel spreadsheet to create a data dictionary. I am going to revisit this topic in detail in another post. You can create a data dictionary in Access for your Forms, Reports, Tables, Views, Modules etc and all this information is available in the Access system tables. You can do the same for SQL Server by using the Information Schema table.

Recommendations:

  • Query system tables to retrieve information for database objects.
  • Merge all queried information into one table and export it to excel to create a data dictionary.
  • Cleanup the extracted information to make the dictionary look neat and clean.

2) Gather database information: Analyze the database to gather general information about it.

Recommendations:

  • Identify database size, name and other general properties.
  • Where is the database located? - filesystem, server etc.
  • Where are the backups kept?
  • is there a backup and restore plan in place before we begin migration efforts.
  • Create database in SQL Server 2005 of appropiate size and name where the data will be migrated to.

3) Define database functionality: Identify all functions that the database is supposed to perform. For example is the database being used for data entry or reporting or for querying information. Plotting the functions and conducting a usage analysis of the database will guarantee that no users are left behind in the migration process and everyone is up and running on the first day.

Besides this certain functions of the database might not be that important and can be ignored and may not need to be migrated. For example, after the migration the new system will use SQL Reporting services so in that case we don't have to worry about migrating all the access reports to the new system. This is just one example but at a high level we should be aware of all the funtions that this system is being used for before start migrating it. For example maybe there is a GIS portion of the database which will need to be migrated to another database and it needs to be broken up from the current system because it is taking up too much space and slowing down daily data entry tasks. Only by identifying the database functions and creating a Use Case analysis document can we plot the next steps successfully.

Recommendations:

  • Create database usage analysis document to plot all the functions performed by the existing database.
  • See if there is a database startup menu or switchboard interface. Plot all visual interfaces.
  • See if there is a web front end of some other third party application that is consuming or changing the database
  • Plot all internal and external interfaces of the data and how it is being used.
  • Identify all database specific functions like printing, reporting etc.
  • Identify how user will move between Forms or any GUI i.e. perform screen analysis of database.
  • Identify how user will generate reports i.e. are there any existing reports that need to be migrated.

3) Define database security: Choose from Windows Authentication to Form based authentication or database authentication or choose a mixed mode security. Security will need to be analyzed at the application level as well as the database level.

Recommendations:

  • Identify current database security.
  • Identify any problems / risks / issues with database security.
  • Compile list of users for database.
  • Divide users in to specific roles for their tasks. How does each user use the database?
  • Develop new authentication mechanism and test its feasibility. Migration is a good time to fix security problems.
  • Define security administration tasks. Who are the admins for the application or the database?

4) Define database audit: Basically in this section you try to answer the question, what things need to be logged and audited so we can track what was changed and by who. Usually you will want to capture all the inserts and updates that are performed by a user at a particular time but you may not want to do this for all your tables since that will take up too much space. As a first step, you can start by implementing a LastUpdateDate and LastUpdateDatetime to log row changes and this will at least let you know who last changed a record. You can take this one step further and implement audit tables that store the old changed record but then someone will need to monitor the audit tables and clean them up on a regular basis. Audit tables may be a good idea if you are dealing with medical records so it all depends on how the database is being used. I will revisit setting up auditting and user tracking in a later post.

Recommendations:

  • Identify all database modifications and activities that need to be logged.
  • Define logging strategy.

5) Define naming convention: What type of naming convention will be used for the database. Will you be using the Microsoft naming convention or the Hungarian notation. For example:- table names could be prefixed with the word “tbl” and reports could be prefixed with the word “rpt”. Also try not to use spaces in column names, it makes queries more complex and error prone. Again this is such a vast topic that it is beyond the scope of this post and I will revisit it in more detail in another post.

Recommendations:

  • Define naming convention for database objects and create naming convention strategy document.
  • create documentation to keep track of database changes. Create a log of database schema changes.
  • Use scripts to make database schema changes. Save scripts and database changes in source control.

6) Identification of static code tables: Catalogue all lookup tables and static tables that contain code and description values. Static tables sometimes contain duplicates or bad values due to user input errors or the application does not have proper data validation to prevent corrupt data from being entered into the table. Static tables are a good point to start with the cleanup efforts.

Please note that you cannot just fix data in the static table or remove it, you have to fix the data in the related tables also or you will wind up corrupting your database. The problem runs a little deeper then it appears. Fixing syntax errors is far easier then identifying rounding and numerical errors. Again this is a vast topic and I will revisit this in another post when I discuss how to create lookup tables and the generic format to use with lookup tables.

Recommendations:

  • Identify static code tables.
  • Identify any data corruptions like duplicate records, spelling mistakes, spaces in names etc.
  • Write sql scripts to cleanup the data.

6) Identification of value Lists: These are lists or enumerated types which are hard coded in Access. Convert these to some stored static table or user defined type. A web application may also contain such values and it is important to collect these and put them into a special table so that they can be use in the new system. User defined types can lead to problems later with the application if not setup correctly.

Recommendations:

  • Identify all Value Lists.
  • Convert to appropriate SQL Server data type depending on your needs. Be careful with numerical rounding issues.

7) Identification of junk or user tables: When a database is used for a long time, there are junk tables or user tables that start showing up in the database which are snapshots of data or data fixes in progress but they are really not part of the actual database. Junk tables are a byproduct day to day user operations. They might be part of calculations or contain a subset of data and are just temporary tables which are needed for a time being. It is a good idea to cleanup these objects every now and then or have some naming convention so you can identify who the junk table belongs to and when it was created so you can remove it more easily.

Recommendations:

  • Identify junk tables.
  • setup naming convention for using junk tables. suggest using temp tables instead.

8) Identification of complex types: These are special user defined types or application specific types which are used in calculations. Your new system or application may need these so it is better to identify them early in the migration process to save down time later.

Recommendations:

  • Define how you will store complex types.
  • Define formatting for complex types.
  • Define validation and default values.

9) Identify tables with duplicate and redundant data: Duplicate and redundant data cause data normalization problems. Good house keeping rules should be followed to make sure there are no duplicates and junk data. It is a good idea to cleanup the extra data during the migration process.

Recommendations:

  • Identify duplicate columns in tables. This is a severe violation and can occurr accross multiple tables for example customer name is repeat in many tables. Such columns will need to be removed and turned into static tables.
  • Merge duplicate data. Data is stored in different tables and all of them have the schema. If schema is the same then merge into one table to clean things up.
  • Run find duplicates query on database. Use this query to identify bad records and come up with a data clean up strategy. Such queries should be run on a monthly basis to monitor database health and fix issues.

10) Identify table relationships and improve normalization: This section deals with identifying data integrity issues. The core tables that make up the database need to be identified in this step. Core tables are data tables that make up the database. In order to enforce data integrity you will need to use Constraints, Triggers, Relationships etc. Again this is such a vast topic that I will revisit this in another post about normalization.

Recommendations:

  • Identify core tables that need to be migrated. These are data tables that make up your database.
  • Create database diagram showing primary keys and foreign keys along with all database relationships.
  • Define relationships between tables. Fix existing relationships if they need improvement.
  • Define mapping between old tables and new ones. Create migration spreadsheet to map between source and destination tables.
  • Define column mappings to avoid duplicate data. If a column is repeated, consider creating a central static table.
  • Write insert queries for data migration. Write scripts to cast / convert data during migration.

11) Define column (field) attributes in core tables: This section is again related to data integrity and data validation.

Recommendations:

  • Identify old table fields that accept NULL values. Null values can cause a lot of programming issues so it is good to identify them and setup defaults.
  • Define new fields which can contain NULL values.
  • Identify field sizes for old database fields. Make sure there is no data truncation happening which is another common type of data corruption that can occur.
  • Define field size for new database fields. Make sure truncation is avoided by increasing field sizes.
  • Identify field type for old database fields. The data type of a field must be correct to avoid any data type mismatch error. Convert or cast values from one data type to another as a last resort to fixing the data.
  • Define Converted type for new database.
  • Write Insert queries to upload data. Write data migration and cleanup scripts. You can either cleanup the data at the source and then migrate it or migrate the data and then clean it. I recommend cleaning the data at the source as much as possible before migrating to the new system or doing cleanup in phases because trying to cleanup everything could lead to a user training or adoption issue and users may lose familiarity with the data.

12) Define database rules: Identify the data interaction that happens between tables. You need to identify all possible values and scenarios that could happen and come up with a test matrix to make sure there is no data corruption happening during user interaction and data entry. Failure to do this could result in database errors and maintenance later.

Recommendations:

  • Identify field constraints for new database. Add field validation and business rules.
  • Map out all possible values and scenarios. Setup test data.

13) Setup data integrity scope: There are many different types of data integrity issues. Depending on the scope the data may be valid but when that same data is used in a different scope then there can be issues. For example it may be okay to have duplicate customers in one table but when this is linked to a master table that does calculations it could lead to wrong calculations in your reports. The data must be used intelligently to avoid any traps. It is a good idea to focus on individual tables first and solve all their data integrity issues before considering the complexity of data integrity issues due to table relationships. 

Recommendations:

  • Identify key columns that could cause data integrity problems for the current table.
  • Define constraints, Rules and Triggers on columns.

14) Define external data integrity issues: These are more complex data integrity issues and there scope is related to the object interaction that goes on beyond different tables, forms and reports etc. This is related to data integrity scope. There are two types of scopes namely internal and external. Internal scope is related to the current database only and internal scope at the table level is only related to the current table only whose data integrity we are trying to fix. External data integrity comes into play when there is corruption happening due to interaction between database objects inside the current database or database objects from other sources. Again this is a vast topic and I will revisit it in detail in another post where I will discuss data scope and how to avoid internal and external data integrity issues.

Recommendations:

  • Identify all object relationships.
  • Define Rules and Triggers on tables to uphold business rules.

15) Identify all database views: Views are queries that are stored inside the database. It is important to define database views so that you avoid data duplication and redundancy by creating new tables. It is important to follow a good naming convention when creating views. Also old and unused views should be removed to keep the database clean. 

Recommendations:

  • Identify queries in database by saving them to a text file. Identify commonly used queries and create stored procedures or views.
  • Convert active queries to views to promote code reuse and knowledge transfer.
  • setup views for different departments, roles, users. Define view usage and display strategy. Setup view access security.

16) Identify all active reports: Reports form the output that the user is used to seeing and they play an important role in the user experience. No migration can be successful without migrating the associated database reports.

Recommendations:

  • Identify all active reports in old database. Log this into the data dictionary. Find out who is using which report.
  • Write Stored Procedures to generate reports.
  • Write Crystal Reports or SQL Server 2005 reports.

17) Identify Conversion Risks: After creating the data dictionary and creating all the documentation in the previous steps, it is now time to put together a risk analysis document based on all the data we have collected.

Recommendations:

  • Collect all migration documentation.
  • Analyse documetation to build risk analysis document.

18) Perform User testing: Test with live transactions and updates. Pilot test the new system. Conduct user training. Stress test the system also.

Recommendations:

  • Identify testing scenarios.
  • Run tests

No comments: