--- title: "Usage of the sdcHierarchies-Package" author: "Bernhard Meindl" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 5 number_sections: false vignette: > %\VignetteIndexEntry{Usage of the sdcHierarchies-Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo=FALSE, include=FALSE} library(rmarkdown) library(sdcHierarchies) ``` # Introduction The `sdcHierarchies` packages allows to create, modify and export nested hierarchies that are used for example to define tables in statistical disclosure control software such as in [**sdcTable**](https://cran.r-project.org/package=sdcTable) # Usage Before using, the package needs to be loaded: ```{r, eval = FALSE} library(sdcHierarchies) ``` ## Create and modify a hierarchy from scratch `hier_create()` allows to create a hierarchy. Argument `root` specifies the name of the root node. Optionally, it is possible to add some nodes to the top-level by listing their names in argument `node_labs`. Also, `hier_display()` shows the hierarchical structure of the current tree as shown below: ```{r} h <- hier_create(root = "Total", nodes = LETTERS[1:5]) hier_display(h) ``` Once such an object is created, it can be modified by the following functions: - `hier_add()`: allows to add nodes to the hierarchy - `hier_delete()`: allows to delete nodes from the tree - `hier_rename()`: allows to rename nodes These functions can be applied as shown below: ```{r} ## adding nodes below the node specified in argument `node` h <- hier_add(h, root = "A", nodes = c("a1", "a2")) h <- hier_add(h, root = "B", nodes = c("b1", "b2")) h <- hier_add(h, root = "b1", nodes = c("b1_a", "b1_b")) # deleting one or more nodes from the hierarchy h <- hier_delete(h, nodes = c("a1", "b2")) h <- hier_delete(h, nodes = c("a2")) # rename nodes h <- hier_rename(h, nodes = c("C" = "X", "D" = "Y")) hier_display(h) ``` We note that the underlying [**data.tree**](https://cran.r-project.org/package=data.tree) package allows to modify the objects on reference so no explicit assignment of the form is required. ## Information about nodes Function `hier_info()` returns information about the nodes that are specified in argument `leaves`. ```{r} # about a specific node info <- hier_info(h, nodes = c("b1", "E")) ``` `info` is a named list where each list element refers to a queried node. The results for level `b1` could be extracted as shown below: ```{r} info$b1 ``` Information about all nodes can be extracted by not specifying argument `leaves`. ## Convert to other formats Function `hier_convert()` takes a hierarchy and allows to convert the network based structure to different formats while `hier_export()` does the conversion and writes the results to a file on the disk. The following formats are currently supported: - `df`: a "@;label"-based format that can be used in [**sdcTable**](https://cran.r-project.org/package=sdcTable) - `dt`: the same as `df`, but the result is returned as a \code{data.table} - `argus`: also a "@;label"-based format that used to create hrc-files suitable for [$\tau$-argus](https://github.com/sdcTools/tauargus/) - `json`: a json-encoded string - `code`: the required code to re-build the current hierarchy - `sdc`: a `list` which is a suitable input for [**sdcTable**](https://cran.r-project.org/package=sdcTable) ```{r, eval=TRUE} # conversion to a "@;label"-based format res_df <- hier_convert(h, as = "df") print(res_df) ``` The required code to create this hierarchy could be computed using: ```{r} code <- hier_convert(h, as = "code"); cat(code, sep = "\n") ``` Using `hier_export()` one can write the results to a file. This is for example useful if one wants to create `hrc`-files that could be used as input for [$\tau$-argus](https://github.com/sdcTools/tauargus/) which can be achieved as follows: ```{r, eval=FALSE} hier_export(h, as = "argus", path = file.path(tempfile(), "hierarchy.hrc")) ``` ## Create a hierarchy from data.frames, code or json `hier_import()` returns a network-based hierarchy given either a data.frame (in `@;labs`-format), json format, code or from a tau-argus compatible `hrc-file`. For example if we want to create a hierarchy based of `res_df` which was previously created using `hier_convert()`, the code is as simple as: ```{r, eval=TRUE} n_df <- hier_import(inp = res_df, from = "df") hier_display(n_df) ``` Using `hier_import(inp = "hierarchy.hrc", from = "argus")` one could create a sdc hierarchy object directly from a `hrc`-file. ## Create/Compute hierarchies from a string Often it is the case, the the nested hierarchy information in encoded in a string. Function `hier_compute()` allows to transform such strings into hierarchy objects. One can distinguish two cases: The first case is where all input codes have the same length while in the latter case the length of the codes differs. Let's assume we have a geographic code given in `geo_m` where digits 1-2 refer to the first level, digit 3 to the second and digits 4-5 to the third level of the hierarchy. ```{r} geo_m <- c( "01051", "01053", "01054", "01055", "01056", "01057", "01058", "01059", "01060", "01061", "01062", "02000", "03151", "03152", "03153", "03154", "03155", "03156", "03157", "03158", "03251", "03252", "03254", "03255", "03256", "03257", "03351", "03352", "03353", "03354", "03355", "03356", "03357", "03358", "03359", "03360", "03361", "03451", "03452", "03453", "03454", "03455", "03456", "10155") ``` Function `hier_compute()` takes a character vector and creates a hierarchy from it. In argument `method`, two ways of specifying the encoded levels can be chosen. - `endpos`: an integerish-vector must be specified in argument `dim_spec` holding the end-position at each level - `len`: an integerish-vector must be specified in argument `dim_spec` containing for each level how many digits are required In case the overal total is not encoded in the input, specifying argument `root` allows to give a name to the overall total. Additionally, it is possible to set the desired output format in parameter `as`. In the example below setting `as = "df"` returns the result as a `data.frame` in `@; key`-format. The two methods on how to define the positions of the levels are interchangable and lead to the same hierarchy as shown below: ```{r} v1 <- hier_compute( inp = geo_m, dim_spec = c(2, 3, 5), root = "Tot", method = "endpos", as = "df" ) v2 <- hier_compute( inp = geo_m, dim_spec = c(2, 1, 2), root = "Tot", method = "len", as = "df" ) identical(v1, v2) hier_display(v1) ``` If the total is contained in the string, let's say in the first 3 positions of the input values, the hierarchy can be computed as follows: ```{r} geo_m_with_tot <- paste0("Tot", geo_m) head(geo_m_with_tot) v3 <- hier_compute( inp = geo_m_with_tot, dim_spec = c(3, 2, 1, 2), method = "len" ); hier_display(v3) ``` The result is the same as `v1` and `v2` previously generated. `hier_compute()` can also deal with inputs that are of different length as shown in the next example. ```{r} ## second example, unequal strings; overall total not included in input yae_h <- c( "1.1.1.", "1.1.2.", "1.2.1.", "1.2.2.", "1.2.3.", "1.2.4.", "1.2.5.", "1.3.1.", "1.3.2.", "1.3.3.", "1.3.4.", "1.3.5.", "1.4.1.", "1.4.2.", "1.4.3.", "1.4.4.", "1.4.5.", "1.5.", "1.6.", "1.7.", "1.8.", "1.9.", "2.", "3.") v1 <- hier_compute( inp = yae_h, dim_spec = c(2,2,2), root = "Tot", method = "len" ); hier_display(v1) ``` We also note that there is another way to specify the inputs in `hier_compute()`. Setting argument `method = "list"` allows to create a hierarchy from a given named list. In such a list, the name of a list element is interpreted as the name of the parent node of all codes of the specific list element. An example is shown below: ```{r} yae_ll <- list() yae_ll[["Total"]] <- c("1.", "2.", "3.") yae_ll[["1."]] <- paste0("1.", 1:9, ".") yae_ll[["1.1."]] <- paste0("1.1.", 1:2, ".") yae_ll[["1.2."]] <- paste0("1.2.", 1:5, ".") yae_ll[["1.3."]] <- paste0("1.3.", 1:5, ".") yae_ll[["1.4."]] <- paste0("1.4.", 1:6, ".") d <- hier_compute(inp = yae_ll, root = "Total", method = "list") hier_display(d) ``` ## Grids Using `hier_grid()` it is possible to compute all combinations of codes given several hierarchies. This is useful to build a complete table (e.g for merging purposes). The functionality of `hier_grid` is shown below. First, we need to specify some hierarchies. ```{r} h1 <- hier_create("Total", nodes = LETTERS[1:3]) h1 <- hier_add(h1, root = "A", node = "a1") h1 <- hier_add(h1, root = "a1", node = "aa1") h2 <- hier_create("Total", letters[1:5]) h2 <- hier_add(h2, root = "b", node = "b1") h2 <- hier_add(h2, root = "d", node = "d1") ``` Note that we - on purpose - added some *"bogus"* codes to each `h1` and `h2` as codes `a1` and `aa1` in `h1` and `b1` and `d1` in `h2` are just identical to their respective parent categories. Applying `hier_grid` is as simple as ```{r} hier_grid(h1, h2) ``` separating all target hierarchies with a `,`. `hier_grid` then computes all combinations of codes from hierarchies `h1` and `h2`. Using the default options, these bogus codes are included in the output `data.table`. Setting argument `add_dups = FALSE` removes all rows containing such bogus codes. Setting option `add_levs = TRUE` adds some columns labeled `levs_v{n}` to the output data set. Each of this colum contains values which define the hierarchy level of the corresponding code given in variable v{n} in the same row in the table as shown below. ```{r} hier_grid(h1, h2, add_dups = FALSE, add_levs = TRUE) ``` ## Interactively create or modify hierarchies The package also contains a shiny-based interactive app that can be started using `hier_app()`. The app allows to pass as input either a character vector (that should be converted into a hierarchy) or an existing hierarchy and can be started as follows given the hierarchy previously generated using `hier_compute()`: ```{r, eval=FALSE} d <- hier_app(d) ``` If a character vector is passed to `hier_app()`, the interface allows to specify the arguments for `hier_compute()`. Once a hierarchy is created, the interface changes and the tree can be dynamically changed by dragging nodes around. Futhermore, it is possible to add, remove or rename nodes. The required code to construct the current hierarchy is displayed and can be saved to disk. Furthermore, there is functionality to *undo* the last step as well as to export results to either the **R**-session or write results to a file. This is especially helpful if one wants to create for example an `hrc`-file as input for $\tau$-argus. Please note that `hier_app()` is able to return the modified hierarchy and not only save results to disk. In order to continue working, one may assign the result to a new object as shown in the code above. ## Summary In case you have any suggestions or improvements, please feel free to file an issue at [**our issue tracker**](https://github.com/bernhard-da/sdcHierarchies/issues) or contribute to the package by filing a [**pull request**](https://github.com/bernhard-da/sdcHierarchies/pulls) against the master branch.