Jim McCarthy Robotics 91.548-201 04-07-03 Presentation Write-up Compact Flash This presentation is on data storage on embedded devices. Investigating this topic has led me to research Compact Flash memory. First we will compare and contrast Compact Flash memory with other, more conventional, data storage options (hard disk drives). Then, results of that discussion will support further discussion of Compact Flash as a viable data storage option for embedded/robotic devices. Disk drives are commonly used and readily available but perhaps aren’t the ideal storage media for a small embedded, robotic device. Compact Flash memory is a solid state data storage device, which is to say that it has no moving mechanical parts. A hard drive has spinning platters within it that need to be spun to read and write data. Since it is not unusual for a working hard drive to spin at 5,000-7,000rpm. With the platters spinning at these speeds, a significant force is generated that could have an effect on whatever its attached to. Compact flash is uses less power than a typical hard drive and is more resistant to shock (by more than 2x). Not only is Compact Flash smaller than a hard drive, but environmental factors such as humidity, altitude, and temperature that affect hard drive performance, don’t negatively affect Compact Flash. The comparisons that favor the hard drive are cost per byte and durability. The hard drive’s mean time between failure is around 1,000,000 hours where compact flash is closer to 1,000,000 reads (which will certainly take less than 1,000,000 hours). In an overall capacity comparison it’s not unusual to find a hard disk with 181GB of capacity, whereas the biggest Compact Flash card available now is about 2GB. While the combination of low power and durability (to shock) make Compact Flash attractive to embedded/robotic systems makers, cost, overall capacity, and Mean Time Between Failures (MBTF) make the disk look like a better option. Understandably, every application is different and I am not suggesting that Compact Flash is the only appropriate method of data storage for embedded/robotic devices. I’m sure that many cases exist where hard drives are the best solution. It’s up to each system designer to include the factors discussed here when choosing a data storage media for his embedded/robotic device. Should you choose Compact Flash memory, the decision making is not over, there are different types of Compact Flash to choose from and we will discuss them here. The two major types are called NAND and NOR. AND is a third type of Compact Flash but it is only made by one manufacturer (Hitachi) and not nearly in the volume of NAND and NOR memory. The names NAND and NOR refer to the gates used in the low level construction of each cell(bit) of memory on the chip. NOR flash memory was developed first, in the mid-eighties, and was followed a few years later by NAND. In NOR memory the individual cells are arranged in a parallel fashion which makes them easy to read quickly (only 10 times slower than SDRAM). An unfortunate side effect to this is that the parallel arrangement of the individual cells means that they cannot be very densely packed and this results in lower total capacities for NOR memory chips. Nor memory is randomly accessible, like RAM, so it exhibits a feature called XIP (Execute In Place). All this means is that code can be accessed a word at a time and that the data arrives fast enough so that some systems can treat NOR memory the same way as RAM. This is nice because if you are using NOR memory your boot code can reside right on it and the system can be told to access that memory at boot time. Though NOR memory can be read quickly, it is very slow to write to. The way flash memory is written is this: when you are writing flash memory, even just a byte, the area of memory that contains the byte to be changed must be read to some other area (a buffer in RAM perhaps?), the block to be written is changed to be all 1’s (0xff) and then the old data is written, with the byte changed, by changing some of the 1’s to 0’s. This is intended to be transparent to the programmer. I think this happens in hardware, on the flash chip, the reason I’m explaining it is to show why writes happen so slowly. So you can see that not only is this a slow process but, the larger your block size, the slower the process becomes and NOR memory is one big block. NAND memory is a little bit different. The individual cells in NAND memory are arranged in a serial manner and this causes reads to happen a bit slower than in NOR memory. NAND memory is block addressable, like a hard drive, so if you want a word from memory you need to write software drivers that will figure out what block of memory the word that you want is located in, get that block into some kind of local buffer, find the word that you want and return just that word. Writing to NAND memory happens in the same manner as writing to NOR memory but because the block sizes are much smaller, the writes happen faster. A drawback to the NAND memory being block addressable is that since you can only read a block at a time you cannot boot directly from it. If you want to have your boot code reside on NAND memory you will need some other kind of flash memory (EEPROM, NOR memory) to get the ball rolling so to speak by getting the first block of memory off the NAND memory and starting to execute commands from it. Because the NAND memory is arranged serially at the cell level, the cells can be packed more densely, resulting in higher storage capacities. Biggest NOR I could find was 64MB, compared with 2GB for NAND memory. NAND memory is also cheaper (on a per byte basis) costing ½ what comparably sized NOR memory costs. NAND is also easier to find since it is the only kind of flash memory most of the major manufacturers make now – Toshiba, SanDisk, etc. No matter what kind of memory is best for each application, a filesystem will probably be needed to keep track of all the data stored on it. The DOS (FAT) filesystem is the one chosen most for embedded/robotic applications. Though a Unix type of filesytem is much more robust, typically embedded/robotic projects can’t make use of all the functionality that a Unix filesystem provides and can get by just fine with a DOS (FAT) filesystem. The understanding the File Allocation Table, or FAT, is the key to the understanding how a DOS filesystem is works. Sometimes when discussing a DOS filesystem its more descriptive to refer to it by the type of FAT it uses: FAT16 or FAT32. The 16 (or the 32) stands for how many bits can be used to address blocks on the storage media. So with a FAT16 system 16 bits are used to address the data blocks. With 16 bits used for addressing there are potentially 64k blocks that can be addressed. Notice that 2 raised to the 16th power is 64k! So a FAT16 File allocation table has 64k entries. Notice that a FAT32 File allocation table would have 2 raised to the 32 or 4 gigabytes of entries. So we take the number of entries in our FAT and that will be the number of blocks in our system. To figure out our block size we divide the size of our memory by the number of blocks we can address. So if we had 256MB of memory in a FAT16 system our block size would be 256MB/64k, resulting in a block size of 4k! So now our FAT is essentially an array of 64k integer entries. Memory on our storage media is arranged this way: The first block will contain boot information (if this is where its being stored). The second block will contain the directory for our files and our FAT will take up the next 128k of memory (64k *2Bytes per entry). Since we’re addressing memory in 4k blocks the next 32 blocks of memory will be taken up by the FAT. Just to recap: block 0 = boot info, block 1 = directory info, and blocks 2-33 = FAT, blocks 34-64k = empty. Now that the memory layout has been shown its time to describe how the FAT is arranged. The FAT is an array with 64k, 16bit entries. Entry 0 corresponds to block 0 in memory (memory address 0->4k-1 on our storage media), entry 1 corresponds to block 1 in memory (addresses 4k->8k-1) and so on. The number at each location in the FAT tells us something about that particular place in memory. First we need to define a few constants. Lets assume that –1 is defined to be the indicator of an unreadable (damaged) block, 0 represents a free block, and 1 represents the last block in a file entry. In the FAT described above, at initialization time (when no files are present) every array entry after #33 will be marked free (with a 0), unless a block is unreadable and then it will be marked with a -1. So just after initialization if you look up block 800 in the FAT you should find a 0, unless block 800 in memory is damaged, in which case you will find a –1. Now it’s time to start saving files. When we want to write to memory we’ll have to use some software to find an available block of memory (it will simply search the FAT entries for a 0!). The software will return to us an index into the FAT (that has a 0 in it). So if we are returned the number 500 we will know that our file is at location 500*4k(block size) in memory. If what we have saved to that location is less than 4k in size, the entry at location 500 in the FAT will be changed from 0 (free) to 1(EOF). If we needed to save more than 4k to memory the FAT would have to be searched again to return to us another free block of memory. Lets say we were returned index 2000. Now index 500 in the FAT will not have a 1 in it, it will have a 2000, so that we know where to find the rest of the file! Index 2000 will have a 1, as long as our file is between 4k and 8k. If our file is bigger than 8k we start this process again and add another link to the list of memory locations our file lives at. Remember that the index in the FAT times the block size yields the location of the desired block in memory, the entry at that index in the FAT tells us either that we’ve reached the EOF or it’s a link to another index in the FAT where the rest of the file lives. What this should show us is that when a file is written to memory it’s important that its directory entry contain its index into the FAT. So when accessing the file the directory entry will be read, the FAT field in the directory structure will be multiplied by 4k to find this file in memory. The FAT will be referenced at the location specified as well. If the file is less than 4k then entry in the FAT will be the code for EOF. If the file is bigger than 4k, the entry in the fat table will give the block number for the next part of the file. This will continue in this linked list fashion until the EOF code is encountered. So you can see that the FAT is a bunch of linked lists that keep track of which blocks belong to which files.