
Heads Up
So, What clues can I give you when dealing with a lot of data that needs to be modified? 1) The first suggestion is that you prepare all your work locally.. either on your desktop or a dedicated computer you use for development and mirroring of your site. If you don't mirror or develop locally then you better start. Get XAMP or another distribution of Apache, PHP, MySQL and Perl and understand how to adjust settings. 2) Plan your attack. If you are dealing with someone elses raw database then there is probably a lot of data that you need to sort by keys and much you won't even use. The first thing you need to do is get a layout of your database. If you can import it locally you can use an app to review the tables. If the database is so large that importation is not easy then strip off the header of a csv file to find the layout or find an app that can give you an .xsd file from an xml export. 3) Reduce and Prepare your Files. Once you know the layout of the raw databases you are working with you can select which data you want to use. If you can import it into mysql then you can drop and edit tables and delete columns that you don't need. Export your working files and then re-import them based on keys to merge the many databases into tables that you can use for your project. Once you have excluded the garbage and arranged your wanted data you have a place to begin building your final database. 4) Test your import scripts before you go through the process of pushing hundreds of thousands of fields around you want to setup a sample database based on real data. Export the first few rows maybe 1,000 rows of each of the tables you plan to work with and pump them though your php script. If you have 1 million rows and you are testing with 1,000 rows or 50,000 rows then you can calculate the execution times necessary to run your final import. IMPORTANT often scripts will slow over long execution times due probably to the way memory is being handled and not released or CPU Loads. This means you may think you calculated an exact time of completion but it may endup taking exponentially longer. 5) Learn how to recover Unless you work with large databases over a period of time continuously something is going to happen to cause a problem. You may have something as simple as a corrupted file or mismatched character sets that wont show up until 70% through your execution.. you may even find that your layout is not correct and you need to backtrack. Make backups often.. sure you can copy tables or even databases within mysql but also make exports and label them well or use an index.txt to know exactly where you were when you made the export.What about Imports and Exports?
Once you have completed all of your work you will need to get it to your webhost. Because most hosting plans won't allow you to burst for more then a few seconds or minutes database sizes have to be within constraints. The first thing you want to do is make sure everything works locally then you can examine your tables and make incremental exports and imports to your host. If you are dealing with many tables or even columns within tables you have the ability to export just that data into manageable sizes and import them separately. If you are working with a single table you can also export only percentages of the database to keep file size within your host's limits. It will take a little bit of trial and error to get your data prepared for upload. What about backups? Once your site is live most hosts will have the option to backup your database on their server and then download it as a file. cPanel will do this live for you from the backup widget or you can do it with mysql on a command line. You will have to learn how to do this based on your host's features. It will be important to keep your local mirror fresh so when you download your database import it into your local mysql database.Final Note
As you can see we didn't cover all of the different applications available for database manipulation or what settings should be used to achieve different results. Manipulating databases because of the many formats you are likely to run into requires that you collect tools and test them for your own results. I can suggest that if you are on windows that you first install XAMP or another package that will give you access to all the tools you use on your host. You will probably need to modify the packages right out of the box but you also want to find your memory and timeout settings. After you have established your database server you can use tools such as phpMyAdmin, MySqlAdministrator and MySQL Workbench for windows available at the mysql site, Microsoft XML Notepad, and Komodo Edit for raw file editing. I understand many people also like Navicat which comes in different versions. I even use Open Office Calc quite a lot to not only layout tables but manage data. It is restricting on large databases because of the row limits but it is a good tool to have. ANYWAY the basic idea is to know what is in the databases you have... know how you want to use that data and plan its layout... then understand your limitations and work around them.. and make lots of incremental backups as you work.. I hope this is a helpful overview ... good luck ..you will need itheroin