Despite the hype, only a few large enterprises or governmental organizations outsource their services to public clouds. On the one hand, cloud computing has been identified as one of the top ten business strategies as it offers many advantages such as greater flexibility and reduced costs. However, on the other hand, outsourcing and, therewith, relinquishing control of services implies many security and privacy problems. Cloud providers often place their data centers in foreign countries where security policies are difficult to enforce, cloud infrastructures are threatened by hackers, insiders, and even malicious customers trying to peek into or tamper with outsourced data. Consequently, the cloud cannot be trusted. As of today, lack of security and privacy guarantees are major adoption obstacles for both, large enterprises and governmental organizations.
Our research, PASMAC, targets the design and evaluation of protocols for secure and privacy- preserving “data analysis” in an untrusted cloud. With PASMAC, the user can store and query data in the cloud, preserving privacy and integrity of outsourced data and queries. PASMAC specifically addresses a real-world cloud framework: Google’s prominent MapReduce paradigm. PASMAC will design and prototype new protocols based on highly parallelizable, efficient privacy-preserving techniques, such as efficient private information retrieval, encrypted Bloom filters, and additive homomorphic encryption.
Research in PASMAC is supported by NSF grant 1218197.
Efficient Private Information Retrieval for MapReduce
One often overlooked problem with outsourcing data to the cloud is privacy of access patterns. There are many exisiting solutions for encrypting outsourced data, but it is usually accepted that the cloud will be able to see when, where and how often you access it. It is often difficult to quantify the information this access pattern leaks to an interested party. For instance, a cloud storing medical data may not be able to read individual patient records, but it could infer that a person has cancer by the fact that an oncologist is accessing his/her records.
Existing techniques that mitigate this problem (oblivious RAM and private information retrieval protocols) currently require too much computation or bandwidth to be usable in cloud situations. PIR techniques can have low bandwidth requirements, but must inherently perform some computation over the entire database for each query. ORAM, in contrast, has low computational cost but must occasionally transfer large portions of the database back to the client to be processed. Since outsourcing data to the cloud only makes sense if you have a lot of it, these techniques are not currently practical for the large database sizes that would occur in such a situation.
Our approach to this problem is to take advantage of the large computational resources that are available in cloud settings to make fast PIR queries possible for large sets of data. We have designed a version of PIR that can run efficiently on MapReduce and scale to a large number of compute nodes. This allows the client to decide how fast they would like the query to return and assign resources accordingly.See also: Travis Mayberry, Erik-Oliver Blass, Agnes Hui Chan, “PIRMAP: Efficient Private Information Retrieval for MapReduce”, Proceedings of Financial Cryptography and Data Security (FC’13), pp. 1—15, Okinawa, USA, 2013 (to appear) Paper
Privacy-Preserving Search in MapReduce
We present PRISM, a privacy-preserving scheme for word search in cloud computing. In the face of a curious cloud provider, the main challenge is to design a scheme that achieves privacy while preserving the efﬁciency of cloud computing. Solutions from related research, like encrypted keyword search or Private Information Retrieval (PIR), fall short of meeting real-world cloud requirements and are impractical.
PRISM’s idea is to transform the problem of word search into a set of parallel instances of PIR on small datasets. Each PIR instance on a small dataset is efﬁciently solved by a node in the cloud during the “Map” phase of MapReduce. Outcomes of map computations are then aggregated during the “Reduce” phase. Due to the linearity of PRISM, the simple aggregation of map results yields the ﬁnal output of the word search operation. We have implemented PRISM on Hadoop MapReduce and evaluated its efﬁciency using real-world DNS logs.
PRISM’s overhead over non-private search is only 11%. Thus, PRISM offers privacy-preserving search that meets cloud computing efficiency requirements. Moreover, PRISM is compatible with standard MapReduce, not requiring any change to the interface or infrastructure.See also: Erik-Oliver Blass, Roberto Di Pietro, Refik Molva, Melek Onen, “PRISM – Privacy-Preserving Search in MapReduce”, Proceedings of Privacy Enhancing Technologies Symposium (PETS’12), pp. 180—200, Vigo, Spain, 2012, ISBN 978-3-642-31679-1 Paper MapReduce Implementation
Efficient Privacy-Preserving Counting for MapReduceIn the face of an untrusted cloud infrastructure, outsourced data needs to be protected. Fully homomorphic encryption is one solution that also allows performing operations on outsourced data. However, the involved high overhead of today's fully homomorphic encryption techniques outweigh cloud cost saving advantages, rendering it impractical. We present EPiC, a practical, efficient protocol for the privacy-preserving evaluation of a fundamental operation on data sets: frequency counting. In an IND-CPA encrypted outsourced data set, a cloud user can specify a pattern, and the cloud will count the number of occurrences of this pattern in a completely oblivious manner. A pattern is expressed as a boolean formula on the fields of the records and can specify values counting, range counting, and conjunctions/disjunctions of field values.
EPiC's main idea is,
first, to reduce the problem of counting to a summation of polynomial evaluations. Second, to efficiently evaluate
the summation of polynomial evaluations in a privacy-preserving manner, we extend previous work
on the Hidden Modular Group Order assumption and design a new
somewhat homomorphic encryption scheme. We show how a
general pattern, defined by a boolean formula, is arithmetized
into a multivariate polynomial over
GF(2) and used in EPiC. This scheme is highly
efficient in our particular counting scenario.
Besides a formal analysis where we prove EPiC's privacy, we also present implementation and evaluation results. We specifically target Google's prominent MapReduce paradigm as offered by major cloud providers. Our evaluation performed both locally and in Amazon's public cloud with data sets sizes of up to 1 TByte shows only modest overhead compared to non-private counting, attesting to EPiC's efficiency.
Efficient Private File Retrieval by Combining ORAM and PIRRecent research results on tree-based Oblivious RAM by Shi et al.  obtain communication complexity of O(l · log3(N)) in the worst-case for an N-capacity storage with blocks size l. The individual nodes in the tree, however, are constructed using traditional ORAMs which have worst-case communication complexity linear in their capacity and block size. PIR protocols are able to provide better worst-case bounds (decoupling capacity from block size), but have traditionally been less practical than ORAM due to the fact that they require O(N) computational complexity on the server. This paper presents Path-PIR, a hybrid ORAM construction, using techniques from PIR, that overcomes the individual weaknesses of each. Path-PIR significantly reduces communication complexity when the block size of the ORAM is large. Compared to existing work, this leads to smaller data transfer costs by orders of magnitude for practical sized databases and achieves worst-case communication complexity of O(l · log2 (N)) for large block sizes. Additionally, the typically high computational cost of PIR is negated by the tree structure of the ORAM, which requires only a small fraction of the database to be operated on for each query. We also investigate the concept of an ORAM’s latency, which is the amount of communication required before users receive the result of their query. We show that Path-PIR achieves lower latency than any existing scheme, only about four times the block size. Using Amazon EC2 as an example, we demonstrate that even with the additional cost of PIR computation, Path-PIR provides a significant monetary saving compared to related work.
See also: Travis Mayberry, Erik-Oliver Blass, Agnes Chan, “Efficient Private File Retrieval by Combining ORAM and PIR”, Proceedings of 20th Annual Network & Distributed System Security Symposium (NDSS ‘13), pp. 1–11, San Diego, USA, 2013
Practical Privacy-Preserving Range and Sort Queries with Update-Oblivious Linked ListsWe present RASP, a new protocol for privacy-preserving range search and sort queries on encrypted data in the face of an untrusted data store. The contribution of RASP over related work is twofold: first, RASP improves privacy guarantees by ensuring that after a query for range [a,b] any new record added to the data store is indistinguishable from random, even if the new record falls within range [a,b]. Second, RASP is highly practical, abstaining from expensive asymmetric cryptography and bilinear pairings. Instead, RASP only relies on hash and block cipher operations. The main idea of RASP is to build upon a new update-oblivious bucket-based data structure. We allow for data to be added to buckets without leaking into which bucket it has been added. As long as a bucket is not explicitly queried, the data store does not learn anything about bucket contents. Furthermore, no information is leaked about data additions following a query. Besides formally proving RASP's privacy, we also present a practical evaluation of RASP using Amazon Dynamo.
See also: Erik-Oliver Blass, Travis Mayberry, Guevara Noubir, “Practical Privacy-Preserving Range and Sort Queries with Update-Oblivious Linked Lists”, Cryptology ePrint Archive: Report 2013/715, 2013