Despite the hype, only a few large enterprises or governmental organizations outsource their services to public clouds. On the one hand, cloud computing has been identified as one of the top ten business strategies as it offers many advantages such as greater flexibility and reduced costs. However, on the other hand, outsourcing and, therewith, relinquishing control of services implies many security and privacy problems. Cloud providers often place their data centers in foreign countries where security policies are difficult to enforce, cloud infrastructures are threatened by hackers, insiders, and even malicious customers trying to peek into or tamper with outsourced data. Consequently, the cloud cannot be trusted. As of today, lack of security and privacy guarantees are major adoption obstacles for both, large enterprises and governmental organizations.

Our research, PASMAC, targets the design and evaluation of protocols for secure and privacy- preserving “data analysis” in an untrusted cloud. With PASMAC, the user can store and query data in the cloud, preserving privacy and integrity of outsourced data and queries. PASMAC specifically addresses a real-world cloud framework: Google’s prominent MapReduce paradigm. PASMAC will design and prototype new protocols based on highly parallelizable, efficient privacy-preserving techniques, such as efficient private information retrieval, encrypted Bloom filters, and additive homomorphic encryption.

Projects

Research in PASMAC is supported by NSF grant 1218197.


PIRMAP

Efficient Private Information Retrieval for MapReduce

One often overlooked problem with outsourcing data to the cloud is privacy of access patterns. There are many exisiting solutions for encrypting outsourced data, but it is usually accepted that the cloud will be able to see when, where and how often you access it. It is often difficult to quantify the information this access pattern leaks to an interested party. For instance, a cloud storing medical data may not be able to read individual patient records, but it could infer that a person has cancer by the fact that an oncologist is accessing his/her records.

Existing techniques that mitigate this problem (oblivious RAM and private information retrieval protocols) currently require too much computation or bandwidth to be usable in cloud situations. PIR techniques can have low bandwidth requirements, but must inherently perform some computation over the entire database for each query. ORAM, in contrast, has low computational cost but must occasionally transfer large portions of the database back to the client to be processed. Since outsourcing data to the cloud only makes sense if you have a lot of it, these techniques are not currently practical for the large database sizes that would occur in such a situation.

Our approach to this problem is to take advantage of the large computational resources that are available in cloud settings to make fast PIR queries possible for large sets of data. We have designed a version of PIR that can run efficiently on MapReduce and scale to a large number of compute nodes. This allows the client to decide how fast they would like the query to return and assign resources accordingly.

See also: Travis Mayberry, Erik-Oliver Blass, Agnes Hui Chan, “PIRMAP: Efficient Private Information Retrieval for MapReduce”, Proceedings of Financial Cryptography and Data Security (FC’13), pp. 1—15, Okinawa, USA, 2013 (to appear) Paper
MapReduce Implementation

PRISM

Privacy-Preserving Search in MapReduce

We present PRISM, a privacy-preserving scheme for word search in cloud computing. In the face of a curious cloud provider, the main challenge is to design a scheme that achieves privacy while preserving the efficiency of cloud computing. Solutions from related research, like encrypted keyword search or Private Information Retrieval (PIR), fall short of meeting real-world cloud requirements and are impractical.

PRISM’s idea is to transform the problem of word search into a set of parallel instances of PIR on small datasets. Each PIR instance on a small dataset is efficiently solved by a node in the cloud during the “Map” phase of MapReduce. Outcomes of map computations are then aggregated during the “Reduce” phase. Due to the linearity of PRISM, the simple aggregation of map results yields the final output of the word search operation. We have implemented PRISM on Hadoop MapReduce and evaluated its efficiency using real-world DNS logs.

PRISM’s overhead over non-private search is only 11%. Thus, PRISM offers privacy-preserving search that meets cloud computing efficiency requirements. Moreover, PRISM is compatible with standard MapReduce, not requiring any change to the interface or infrastructure.

See also: Erik-Oliver Blass, Roberto Di Pietro, Refik Molva, Melek Onen, “PRISM – Privacy-Preserving Search in MapReduce”, Proceedings of Privacy Enhancing Technologies Symposium (PETS’12), pp. 180—200, Vigo, Spain, 2012, ISBN 978-3-642-31679-1 Paper
MapReduce Implementation

EPiC

Efficient Privacy-Preserving Counting for MapReduce

In the face of an untrusted cloud infrastructure, outsourced data needs to be protected. Fully homomorphic encryption is one solution that also allows performing operations on outsourced data. However, the involved high overhead of today's fully homomorphic encryption techniques outweigh cloud cost saving advantages, rendering it impractical. We present EPiC, a practical, efficient protocol for the privacy-preserving evaluation of a fundamental operation on data sets: frequency counting. In an IND-CPA encrypted outsourced data set, a cloud user can specify a pattern, and the cloud will count the number of occurrences of this pattern in a completely oblivious manner. A pattern is expressed as a boolean formula on the fields of the records and can specify values counting, range counting, and conjunctions/disjunctions of field values.

EPiC's main idea is, first, to reduce the problem of counting to a summation of polynomial evaluations. Second, to efficiently evaluate the summation of polynomial evaluations in a privacy-preserving manner, we extend previous work on the Hidden Modular Group Order assumption and design a new somewhat homomorphic encryption scheme. We show how a general pattern, defined by a boolean formula, is arithmetized into a multivariate polynomial over GF(2) and used in EPiC. This scheme is highly efficient in our particular counting scenario.

Besides a formal analysis where we prove EPiC's privacy, we also present implementation and evaluation results. We specifically target Google's prominent MapReduce paradigm as offered by major cloud providers. Our evaluation performed both locally and in Amazon's public cloud with data sets sizes of up to 1 TByte shows only modest overhead compared to non-private counting, attesting to EPiC's efficiency.

See also: Erik-Oliver Blass, Guevara Noubir, Triet Vo Huu, “EPiC: Efficient Privacy-Preserving Counting for MapReduce”, Cryptology ePrint Archive: Report 2012/452, 2012 Paper
MapReduce Implementation

Path-PIR

Efficient Private File Retrieval by Combining ORAM and PIR

Recent research results on tree-based Oblivious RAM by Shi et al. [14] obtain communication complexity of O(l · log3(N)) in the worst-case for an N-capacity storage with blocks size l. The individual nodes in the tree, however, are constructed using traditional ORAMs which have worst-case communication complexity linear in their capacity and block size. PIR protocols are able to provide better worst-case bounds (decoupling capacity from block size), but have traditionally been less practical than ORAM due to the fact that they require O(N) computational complexity on the server. This paper presents Path-PIR, a hybrid ORAM construction, using techniques from PIR, that overcomes the individual weaknesses of each. Path-PIR significantly reduces communication complexity when the block size of the ORAM is large. Compared to existing work, this leads to smaller data transfer costs by orders of magnitude for practical sized databases and achieves worst-case communication complexity of O(l · log2 (N)) for large block sizes. Additionally, the typically high computational cost of PIR is negated by the tree structure of the ORAM, which requires only a small fraction of the database to be operated on for each query. We also investigate the concept of an ORAM’s latency, which is the amount of communication required before users receive the result of their query. We show that Path-PIR achieves lower latency than any existing scheme, only about four times the block size. Using Amazon EC2 as an example, we demonstrate that even with the additional cost of PIR computation, Path-PIR provides a significant monetary saving compared to related work.

See also: Travis Mayberry, Erik-Oliver Blass, Agnes Chan, “Efficient Private File Retrieval by Combining ORAM and PIR”, Proceedings of 20th Annual Network & Distributed System Security Symposium (NDSS ‘14), pp. 1–11, San Diego, USA, 2014 Paper
Amazon Implementation

RASP

Practical Forward-Secure Range and Sort Queries with Update-Oblivious Linked Lists

We present RASP, a new protocol for privacy-preserving range search and sort queries on encrypted data in the face of an untrusted data store. The contribution of RASP over related work is twofold: first, RASP improves privacy guarantees by ensuring that after a query for range [a,b] any new record added to the data store is indistinguishable from random, even if the new record falls within range [a,b]. Second, RASP is highly practical, abstaining from expensive asymmetric cryptography and bilinear pairings. Instead, RASP only relies on hash and block cipher operations. The main idea of RASP is to build upon a new update-oblivious bucket-based data structure. We allow for data to be added to buckets without leaking into which bucket it has been added. As long as a bucket is not explicitly queried, the data store does not learn anything about bucket contents. Furthermore, no information is leaked about data additions following a query. Besides formally proving RASP's privacy, we also present a practical evaluation of RASP using Amazon Dynamo.

See also: Erik-Oliver Blass, Travis Mayberry, Guevara Noubir, “Practical Privacy-Preserving Range and Sort Queries with Update-Oblivious Linked Lists”, Cryptology ePrint Archive: Report 2013/715, 2013 Paper

This material is based upon work supported by the National Science Foundation under Grant Number (1218197).

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.