The privacy policies written by different application development companies or individuals can be in different formats. The authors also can use different ways to describe the similar data practices. Therefore, there is a requirement to create a shared understanding between privacy policies of mobile or web-based applications. Formal ontologies can be used to create a shared understanding between policy authors, regulators, application developers, and users. However, there is no standard method to build an ontology. I have developed an empirical method to construct an ontology manually which is based on grounded theory. The method consists of seven heuristics that explain how to infer hypernymy, meronymy and synonymy relationships from information type phrases, which we discovered using grounded analysis of five privacy policies. The method was evaluated on 355 phrases from 50 privacy policies.
Based on the heuristics, I also developed 27 semantic rules that automatically constructs semantic relationships between phrases which uses a shallow typing system. The semantic rules utilizes different ways to describe data called "variants." For example, "device name" and "mobile device id" are both variants of "device." Therefore, we can infer new variants and relationships between them using the semantic rules. Finally, I use description logic and OWL API to construct an ontology using the inferred variants and relations.
Mobile applications collect and share users' personal information to fulfil functionalities and provide services to users. Due to the importance of privacy, regulators often require applications to present their data practices through a legal document called "privacy policies." The privacy authors might differ from application developers for a similar app. Therefore, they can use different words to describe the similar practice. Our research group provides a framework to help developers, to bridge the gap between privacy policies and application code and reduce the inconsistencies. In this project, I have developed a domain ontology on information types that are collected by applications to overcome the problem of abstraction in privacy policies. The privacy policies are tend to use abstract terms. They also use hypernymy to represent specific information types related to a more general information type. I also developed a natural language processing tool that identifies the information that is mentioned to be collected in privacy policies and compare them with the ontology to find the level of abstraction in privacy policies.