Despite the hype, only a few large enterprises or governmental organizations outsource their services to public clouds. On the one hand, cloud computing has been identified as one of the top ten business strategies as it offers many advantages such as greater flexibility and reduced costs. However, on the other hand, outsourcing and, therewith, relinquishing control of services implies many security and privacy problems. Cloud providers often place their data centers in foreign countries where security policies are difficult to enforce, cloud infrastructures are threatened by hackers, insiders, and even malicious customers trying to peek into or tamper with outsourced data. Consequently, the cloud cannot be trusted. As of today, lack of security and privacy guarantees are major adoption obstacles for both, large enterprises and governmental organizations.

Our research, PASMAC, targets the design and evaluation of protocols for secure and privacy- preserving “data analysis” in an untrusted cloud. With PASMAC, the user can store and query data in the cloud, preserving privacy and integrity of outsourced data and queries. PASMAC specifically addresses a real-world cloud framework: Google’s prominent MapReduce paradigm. PASMAC will design and prototype new protocols based on highly parallelizable, efficient privacy-preserving techniques, such as efficient private information retrieval, encrypted Bloom filters, and additive homomorphic encryption.

Projects

Research in PASMAC is supported by NSF grant 1218197.


PIRMAP

Efficient Private Information Retrieval for MapReduce

One often overlooked problem with outsourcing data to the cloud is privacy of access patterns. There are many exisiting solutions for encrypting outsourced data, but it is usually accepted that the cloud will be able to see when, where and how often you access it. It is often difficult to quantify the information this access pattern leaks to an interested party. For instance, a cloud storing medical data may not be able to read individual patient records, but it could infer that a person has cancer by the fact that an oncologist is accessing his/her records.

Existing techniques that mitigate this problem (oblivious RAM and private information retrieval protocols) currently require too much computation or bandwidth to be usable in cloud situations. PIR techniques can have low bandwidth requirements, but must inherently perform some computation over the entire database for each query. ORAM, in contrast, has low computational cost but must occasionally transfer large portions of the database back to the client to be processed. Since outsourcing data to the cloud only makes sense if you have a lot of it, these techniques are not currently practical for the large database sizes that would occur in such a situation.

Our approach to this problem is to take advantage of the large computational resources that are available in cloud settings to make fast PIR queries possible for large sets of data. We have designed a version of PIR that can run efficiently on MapReduce and scale to a large number of compute nodes. This allows the client to decide how fast they would like the query to return and assign resources accordingly.

See also: Travis Mayberry, Erik-Oliver Blass, Agnes Hui Chan, “PIRMAP: Efficient Private Information Retrieval for MapReduce”, Proceedings of Financial Cryptography and Data Security (FC’13), pp. 1—15, Okinawa, USA, 2013 (to appear) Paper
MapReduce Implementation

PRISM

Privacy-Preserving Search in MapReduce

We present PRISM, a privacy-preserving scheme for word search in cloud computing. In the face of a curious cloud provider, the main challenge is to design a scheme that achieves privacy while preserving the efficiency of cloud computing. Solutions from related research, like encrypted keyword search or Private Information Retrieval (PIR), fall short of meeting real-world cloud requirements and are impractical.

PRISM’s idea is to transform the problem of word search into a set of parallel instances of PIR on small datasets. Each PIR instance on a small dataset is efficiently solved by a node in the cloud during the “Map” phase of MapReduce. Outcomes of map computations are then aggregated during the “Reduce” phase. Due to the linearity of PRISM, the simple aggregation of map results yields the final output of the word search operation. We have implemented PRISM on Hadoop MapReduce and evaluated its efficiency using real-world DNS logs.

PRISM’s overhead over non-private search is only 11%. Thus, PRISM offers privacy-preserving search that meets cloud computing efficiency requirements. Moreover, PRISM is compatible with standard MapReduce, not requiring any change to the interface or infrastructure.

See also: Erik-Oliver Blass, Roberto Di Pietro, Refik Molva, Melek Onen, “PRISM – Privacy-Preserving Search in MapReduce”, Proceedings of Privacy Enhancing Technologies Symposium (PETS’12), pp. 180—200, Vigo, Spain, 2012, ISBN 978-3-642-31679-1 Paper
MapReduce Implementation

EPiC

Efficient Privacy-Preserving Counting for MapReduce

In the face of an untrusted cloud infrastructure, outsourced data needs to be protected. Fully homomorphic encryption is one solution that also allows performing operations on outsourced data. However, the involved high overhead of today's fully homomorphic encryption techniques outweigh cloud cost saving advantages, rendering it impractical. We present EPiC, a practical, efficient protocol for the privacy-preserving evaluation of a fundamental operation on data sets: frequency counting. In an IND-CPA encrypted outsourced data set, a cloud user can specify a pattern, and the cloud will count the number of occurrences of this pattern in a completely oblivious manner. A pattern is expressed as a boolean formula on the fields of the records and can specify values counting, range counting, and conjunctions/disjunctions of field values.

EPiC's main idea is, first, to reduce the problem of counting to a summation of polynomial evaluations. Second, to efficiently evaluate the summation of polynomial evaluations in a privacy-preserving manner, we extend previous work on the Hidden Modular Group Order assumption and design a new somewhat homomorphic encryption scheme. We show how a general pattern, defined by a boolean formula, is arithmetized into a multivariate polynomial over GF(2) and used in EPiC. This scheme is highly efficient in our particular counting scenario.

Besides a formal analysis where we prove EPiC's privacy, we also present implementation and evaluation results. We specifically target Google's prominent MapReduce paradigm as offered by major cloud providers. Our evaluation performed both locally and in Amazon's public cloud with data sets sizes of up to 1 TByte shows only modest overhead compared to non-private counting, attesting to EPiC's efficiency.

See also: Erik-Oliver Blass, Guevara Noubir, Triet Vo Huu, “EPiC: Efficient Privacy-Preserving Counting for MapReduce”, Cryptology ePrint Archive: Report 2012/452, 2012 Paper
MapReduce Implementation

Path-PIR

Efficient Private File Retrieval by Combining ORAM and PIR

Recent research results on “bucketed” Oblivious RAM by Shi et al. [12] reduce communication for an N-capacity storage with blocks of size l bits to poly-logarithmic complexity O(l · log^3(N)) in the worst- case. The individual buckets, however, are constructed using traditional ORAMs which have worst-case communication complexity being linear in their size. PIR protocols are able to provide better worst-case bounds, but have traditionally been less practical than ORAM due to the fact that they require O(N) computation complexity on the server. This paper presents Path-PIR, a hybrid ORAM construction, using techniques from PIR, that overcomes the individual weaknesses of each. Path-PIR’s main idea is to replace the individual buckets in the ORAM construction by Shi et al. [12] with buckets backed by PIR. We show that this leads to orders of magnitude smaller data transfer costs for practically sized databases, compared to existing work, and achieves better asymptotic communication O(l · log^2 (N)) for large block sizes. Additionally, the typically high computational cost of PIR is negated by the small size of the individual buckets. We also show that Path-PIR has very low latency, i.e., a low amount of data is required before a user receives the result of his data request (approximately 4 times the block size). Using Amazon EC2, we demonstrate that monetary cost induced by the server’s PIR computation are far outweighed by the savings in data transfer.

See also: Travis Mayberry, Erik-Oliver Blass, Agnes Chan, “Efficient Private File Retrieval by Combining ORAM and PIR”, Cryptology ePrint Archive: Report 2013/086, 2013 Paper