News/Project Deluge: PlayStation 2
Introducing, one of the biggest endeavors that we have taken yet.
It’s been a while since our last release back on New Years day. In the months before and since that release, we have been working on something truly special - a project so massive it practically eclipses everything that we’ve tackled up until this point. The only thing comparable was back on February 23rd, 2008 (over ten years ago) where we released over one thousand prototypes for various Sega consoles from old QA department backups. Since then, there have been many releases that have come close to matching the amount of content that could be found in a single release. But up until now, including our own efforts, they have only focused on specific periods of gaming history or specific products, companies, etc. Until today.
Today we are introducing the first part of Project Deluge, an ongoing project to assess a gigantic lot of various video game prototypes, pre production assets, and archival material spanning multiple console generations. These aging items were miraculously rescued from being destroyed, thrown away, or sold through the herculean efforts of one person. This person not only took on the task of backing up everything in their possession single handedly, but was so overwhelmingly kind enough to let us look at and preserve each item in his collection with no strings attached. Yes that’s right, all of it. For nothing in return.
In the middle of last year, we were asked by our friend Jason Scott at the Internet Archive to take a crack at looking into what the owner had archived up until this point. Little did we know that it would only represent just a small fraction of what would be eventually discovered in the many months down the road. The owner had been working with two long time friends of the site Iniche and Zoda-Y13 to assist in gathering all the resources the owner needed to backup everything he had and to do preliminary work on assessing the initial dumps.
When the project began, we were given access to the PlayStation 2 builds that have been backed up until that point. Given the circumstances surrounding COVID-19 at the time, which continues for some as of writing, it opened up an opportunity for many owners of game development material to look through their things and share what they’ve discovered. At the time only several hundred of what was initially assumed to be strictly PlayStation 2 disc dumps were made. Little did we know that the number would grow well into the thousands for just PlayStation 2 alone, with that number possibly growing every day.
Knowing what the lot was possible to contain, we knew that more work would be required in order to do this properly, more work than we’ve ever put into a project before.
Last summer, we devised a plan to tackle everything to streamline the process of evaluating every item. drx, Sazpaimon, and ehw began designing and developing a custom set of scripts and tools to help trim that fat from the lot as much as possible. The goal was to eliminate all the final retail builds from the lot so that the only builds that remained were pre production (localization) prototypes, or unreleased revisions. Because not all optical media are created equal, the script had to be robust enough to acquire enough metadata from each dump so that it could be used to help determine what the build could actually be and to help auto populate information wherever possible. In order to do this in determining final builds from the lot, we knew we had to create a database that consisted of gathered metadata from every single possible final (and released non final) build of every single game from every single system known to the world. When our tools were mature enough, we gathered every Redump set that we could find and ran our tools against the set to create an easily queryable database, consisting of main executable timestamps, last modified timestamps, volume information, checksums, and other system specific parameters that could help differentiate each build.
Traditionally, one may generate a checksum (a value that is derived using an algorithm against a set of data) to help create unique identifiers for specific files. However, we went one step further and calculated what we call ‘composite’ checksums of every possible disc from various Redump sets. Unlike a traditional checksum where a value is derived from one continuous block of data, a composite checksum is the sum of multiple calculated checksums in a group of data. In this case, instead of generating a checksum based on an entire disc image file, we wrote a script that calculated the checksums of every readable file contained in various disc (image) formats and added the values together to create a unique identifier that represented the whole disc. The reason for using this kind of checksum over a traditional single file checksum, speaking in terms of disc images, is that you’re looking at the files stored within the image rather than the image itself. There are many additional attributes associated with a disc image that go beyond the file system it contains, such as volume information, that is trivial when focusing on the data contained within the image itself. For certain checksum algorithms, if a single byte were to change in the entire file, then the whole checksum will change. For images, it’s likely that volume information can change from one copy of a disc to another, giving an impression that each copy of the data contained on a disc image is unique, when it actually isn’t. Doing this helped us eliminate a lot of final builds from the lot very quickly, and it also helped spot potential prototypes from the lot as well.
However, composite checksums alone aren’t enough to properly identify a final build from a set. We experimented and experienced a lot of trial and error before we were comfortable with what we had. Checking various aspects of each dump such as the latest modified timestamp, main executable checksum, etc against what is to be expected from the final equivalent was often necessary to rule out final builds from the set. In instances where the composite checksum alone didn’t match anything, we rigorously compared other aspects of each dump individually until what remained left no room for question.
The amount of prototypes that remained, however, were still overwhelming. At the end of the day, we knew that in order to really know what was contained in the lot for certain, it all had to be played. In the fall of last year, we gathered a team of dedicated individuals who could help analyze what remained from the lot to determine if the item was a prototype, what unique features it had, and either or not it could run in an emulator or on actual hardware. Not only did we go through each item from the lot, but we also looked at each problematic prototype carefully to determine what the issues were and how to fix them. For instance, we even went as far as to create separate patches for games in the lot that utilized lock out measures to prevent games from being played on intended hardware configurations. For builds that were in a format that were non native to the target hardware, we converted each instance to a more playable format as well. If we were a fan of an item featured in the lot, we investigated it further, driven only through passion alone.
Unfortunately, there were some major setbacks pertaining to certain aspects of archiving and assessing certain things from the lot. The biggest of which deals with data integrity. The initial batches of dumps had issues due to the software used and we had to iterate on the process over time. Since we weren’t involved until well into when the majority of the PlayStation 2 dumps were made, it wasn’t reasonable nor fair to request that all of the hundreds of dumps made up until that point be scrapped and dumped with different software and hardware, the latter which would’ve involved having to locate very specific drives that would not have been easy to accomplish at the time. It is really difficult to convey with words the effort and energy it takes to dump thousands of discs, unless you’ve done it before. Because of the lack of insight into the archiving process early on, we aren’t able to determine that the integrity of each and every dump featured in the lot is free from errors caused by the dumping process.
More issues occurred when you consider the age and variable quality of the recordable media that was used at the time. Most of the discs haven’t been used since they were mastered but storage conditions might have allowed some of the discs to rot over time. What some people might not realize is that not every prototype in the wild is wholly unique, and copies of some press review and preview builds were produced using off the shelf duplicators. If the “master” disc was damaged or overused before being used as the basis for mastering copies, those defects that exist in the “master” copy would carry over baked into each copy. Also, while PlayStation 2 prototypes were often mastered using CDVDGEN, a proprietary tool from Sony to create disc images for authoring, prototypes could sometimes be authored using images that contained (un)intentional mastering mistakes. These mistakes wouldn’t persist in mass produced manufactured copies that went to stores, but could happen very often for prototypes. Not all recordable discs are created equal either, as some brands of recordable discs fared better than others (woe to those who backed up anything on a Princo CD-R). This was more common in the early days of storing data on optical media, and got better as time went on. As such, it wasn’t as prevalent during the PlayStation 2 era, but it still occurred nonetheless. Because of the inconsistencies brought about some of the things mentioned, even the disc drive hardware and the subsequent software that can create the disc image backup itself can misinterpret these inconsistencies, creating bad dumps. Some developers used these imperfections to their advantage, while others simply weren’t thinking of long term storage when they were burning some of these discs years ago.
At the end of the day, we know that the dumps currently available could be much better. We are fully aware that the dumps aren’t perfect and could contain errors. We attempted to analyze each and every dump for errors by scanning the disc images that contained EDC/ECC data for non correctable errors, and loaded each build in either an emulator (if playable) or on real hardware and determined any issues on a case by case basis. While it would’ve been better to dump everything from the getgo using the best methods and highest standards available, in terms of archival in the grand scheme of things, any dump is better than no dump at all. While this could potentially complicate things down the road, there’s also a possibility of revisiting these discs with the proper tools down the road. But for now, the best we can do is correct the mistakes and move on.
Now that the team is more involved going forward, we’ll be more careful in ensuring that every dump is error free, and to take notes on more problematic discs that have issues so that they can be revisited.
So for the first part of the lot, we would like to present to you over ‘’’700’’’ unique prototype builds for the PlayStation 2! Since the list of builds is so massive, we had to relocate the list to its own page that you can access here! Compressed, this brings a total of about 860 gigabytes of data. For obvious reasons, we chose not to host the matched builds but we are keeping a public record of each one and any interesting properties that can be found in them in a table that you can check out here. Eventually, we will be populating the site with various other assets from the lot as well, so keep an eye on this page in the coming months. Obviously, there were more PlayStation 2 discs that contained obvious non correctable errors (or simply couldn’t be read by the owner’s drive at the time). We’ll be revisiting these soon with different drives so we get another chance at creating an error free dump, and making note of the undumpable/non correctable discs in a list later in the future, with the imperfect dumps provided if possible.
As you can imagine, there are a ton of highlights from this part of the lot, far too much to even count let alone mention! From press previews, reviews, localization prototypes, tech demos, debug builds, and so on - there’s something for everyone in this part of the lot!
Just to name a few, we have an E3 prototype of Crash Bandicoot: The Wrath of Cortex as well as an E3 prototype of Shadow of the Colossus, both based on really early versions of the game. We have pre-Japanese prototypes such as God Hand and Dino Stalker, both also include really cool debugging tools as well! We also have many other builds that contain active debuggers like Dragon’s Lair 3D, Final Fantasy X-2, Legacy of Kain: Soul Reaver 2, plus many many many more. This just barely scratches the surface of what’s here, enough to last anyone years!
Interestingly, the vast majority of the PlayStation 2 builds featured in the lot were mastered on recordable media (CD-Rs/DVD-Rs) rather than some pressed discs produced en masse. Each disc was dumped using either PowerISO or CloneCD with some exceptions, and edccchk was used to evaluate each disc that utilized EDC/ECC data for errors. As stated before, it’s likely that there are builds featured in this part of the lot that contain errors that might cause unintentional side effects when played. If you encounter any issues while playing any build that you suspect is caused by a disc error, please notify us and make a note of it on the entry for the wiki and we will revisit any of these in the future.
Lastly, we would like to thank all the members of the Project Deluge team for helping us with this project so far. Without your help, it would’ve taken eons for anything to come about. We’d like to thank Jason Scott from the Internet Archive for giving us an opportunity to go through this journey and for providing the hosting for this ongoing project, and Iniche for working with the owner of all of these wonderful builds to make it all happen. Special thanks to Master Emerald for once again creating beautiful art for us to help make each of our releases something special (especially on short notice!). We’d like to thank drx and ehw for writing the scripts and helping get the project initially off the ground, and Sazpaimon for taking everything much further by expanding the capabilities of the script, running some of the builds on hardware, the streams, and so much more. And last but certainly not least, we’d like to thank all of our researchers (Zoda-Y13, GopherGirl, Xkeeper (TCRF), Rusty (TCRF), Shoemanbundy, Hwd45, SolidSnake, DigitalWarrior, Nex, and Drac) for taking the time to help us go through every single build in this lot so far (and all those delectable sports prototypes).
Well, that’s it for now folks! As this lot is an actively on going project, we do not have a tentative release schedule. However, be prepared for the next part of the lot real soon. More is on the way, so hang on tight! Also be sure to be on the lookout on The Cutting Room Floor (TCRF) for some more detailed research of the items featured in the lot in the coming months ahead.
(NOTE: We are in the middle of migrating our server to take advantage of more storage. We have added the capability of adding external links on the Prototype form. PLEASE DO NOT REUPLOAD ANY PROTOTYPE FROM THE LOT ONTO THE MAIN SITE FOR NOW. While the links on the wiki side don’t work at the moment, you should be able to get to the externally hosted Archive.org link by clicking on the external link featured in every article. If you encounter an error with the external link, please let us know or make the correction if you know what’s causing the issue. This was a big lot so mistakes may happen, so be aware!)