Facebook has prototyped new hardware that can dramatically slash the power required to run web applications like Memcached that support some of the Internet’s largest sites, the company said today. The tests found that low-power processors fromTilera could get three times the performance-per-watt of x86 servers when running key-value store applications.
The research, which is being published today at the International Green Computing Conference, is significant on several fronts. It illustrates how Facebook and other huge data center operators are test-driving new hardware to address their toughest workloads and slash their power bills, and are looking beyond the x86 architecture that has traditionally dominated data center computing.
Boost for Tilera ArchitectureThe study also provides a major boost for Tilera, which uses a low-power, many-core approach. The testing was conducted using the Tilera-based S2Q server built by Quanta Computer. which is also the manufacturer of Facebook’s Open Compute platform. The server incorporates eight of Tilera’s second-generation TILEPro64 processors into a 2U form factor.
“Our experiments show that a tuned version of Memcached on the 64-core Tilera TILEPro64 can yield at least 67% higher throughput than low-power x86 servers at comparable latency,” says the paper, authored by Mateusz Berezecki, Eitan Frachtenberg and Mike Paleczny of Facebook and Tilera’s Kenneth Steele. “When taking power and node integration into account as well, a TILEPro64-based S2Q server with 8 processors handles at least three times as many transactions per second per Watt as the x86-based servers with the same memory footprint.”
Key-Value Stores Key to Big InfrastructureKey-value stores play an important role in many large websites. The most prominent example is Memcached, which is used by Facebook, Zynga, Twitter, Wikipedia, Flickr, YouTube, Digg, WordPress and Craigslist. Memcached is important enough to Facebook’s operations that CEO Mark Zuckerberg gave a Tech Talk on it in late 2008.
The Tilera-based server and the competing x86-based servers both met Facebook’s latency requirement of processing the transactions in less than one millisecond. Both platforms had comparable latencies, but Tilera’s processors performed more transactions per second at the required latency.
Memcached is used to provide fast response times to users by keeping data in memory rather than on drives. The authors noted that Tilera’s TilePRO64 architecture ideal for Memcached and other key-value stores workloads because it combines the low-power consumption of slower clock speeds with the increased throughput of many independent cores.
Some Customization RequiredHowever, those performance gains required some customization and tweaking of Memcached, including splitting data tables into separate shards to allow more effective parallel processing, a step required to adapt to TilePRO 64′s 32-bit instruction set. While this is within the capabilities of many large companies using Memcached, it’s not necessary with x86-based architectures.
Tilera uses an architecture that eliminates the on-chip bus interconnect, a centralized intersection where information flows between processor cores or between cores and the memory and I/O. Instead, Tilera employs an on-chip mesh network to interconnect cores. Tilera says its architecture provides similar capabilties for its caching system, evenly distributing the cache system load for better scalability.
Facebook plans to run the same study on Tilera’s new 64-bit Gx3000 series, which was announced in June and will begin sampling later this month.
Facebook’s hardware research is a reminder of the growing role of original design manufacturer (ODM) companies like Quanta, which is an investor in Tilera and is emerging as a key player in the market for custom cloud servers. Quanta is also rumored to be a source for custom servers at Google, prompting recent reports that Google is test-driving servers using Tilera chips.