`gen_data.Rd`

Generates data of mixed types from the latent Gaussian copula model.

```
gen_data(
n = 100,
types = c("ter", "con"),
rhos = 0.5,
copulas = "no",
XP = NULL,
showplot = FALSE
)
```

- n
A positive integer indicating the sample size. The default value is 100.

- types
A vector indicating the type of each variable, could be

`"con"`

(continuous),`"bin"`

(binary),`"tru"`

(truncated) or`"ter"`

(ternary). The number of variables is determined based on the length of types, that is`p = length(types)`

. The default value is`c("ter", "con")`

which creates two variables: the first one is ternary, the second one is continuous.- rhos
A vector with lower-triangular elements of desired correlation matrix, e.g.

`rhos = c(.3, .5, .7)`

means the correlation matrix is`matrix(c(1, .3, .5, .3, 1, .7, .5, .7, 1), 3, 3)`

. If only a scalar is supplied (`length(rhos) = 1`

), then equi-correlation matrix is assumed with all pairwise correlations being equal to`rhos`

. The default value is 0.5 which means correlations between any two variables are 0.5.- copulas
A vector indicating the copula transformation f for each of the p variables, e.g. U = f(Z). Each element can take value

`"no"`

(f is identity),`"expo"`

(exponential transformation) or`"cube"`

(cubic transformation). If the vector has length 1, then the same transformation is applied to all p variables. The default value is`"no"`

: no copula transformation for any of the variables.- XP
A list of length p indicating proportion of zeros (for binary and truncated), and proportions of zeros and ones (for ternary) for each of the variables. For continuous variable, NA should be supplied. If

`NULL`

, the following values are automatically generated as elements of`XP`

list for the corresponding data types: For continuous variable, the corresponding value is NA; for binary or truncated variable, the corresponding value is a number between 0 and 1 representing the proportion of zeros, the default value is 0.5; for ternary variable, the corresponding value is a pair of numbers between 0 and 1, the first number indicates the the proportion of zeros, the second number indicates the proportion of ones. The sum of a pair of numbers should be between 0 and 1, the default value is`c(0.3, 0.5)`

.- showplot
Logical indicator. If TRUE, generates the plot of the data when number of variables p is no more than 3. The default value is FALSE.

`gen_data`

returns a list containing

X: Generated data matrix (n by p) of observed variables.

plotX: Visualization of the data matrix X. Histogram if

`p=1`

. 2D Scatter plot if`p=2`

. 3D scatter plot if`p=3`

. Returns NULL if`showplot = FALSE`

.

Fan J., Liu H., Ning Y. and Zou H. (2017) "High dimensional semiparametric latent graphicalmodel for mixed data" doi:10.1111/rssb.12168 .

Yoon G., Carroll R.J. and Gaynanova I. (2020) "Sparse semiparametric canonical correlation analysis for data of mixed types" doi:10.1093/biomet/asaa007 .

```
# Generate single continuous variable with exponential transformation (always greater than 0)
# and show histogram.
simdata = gen_data(n = 100, copulas = "expo", types = "con", showplot = FALSE)
X = simdata$X; plotX = simdata$plotX
# Generate a pair of variables (ternary and continuous) with default proportions
# and without copula transformation.
simdata = gen_data()
X = simdata$X
# Generate 3 variables (binary, ternary and truncated)
# corresponding copulas for each variables are "no" (no transformation),
# "cube" (cube transformation) and "cube" (cube transformation).
# binary variable has 30% of zeros, ternary variable has 20% of zeros
# and 40% of ones, truncated variable has 50% of zeros.
# Then show the 3D scatter plot (data points project on either 0 or 1 on Axis X1;
# on 0, 1 or 2 on Axas X2; on positive domain on Axis X3)
simdata = gen_data(n = 100, rhos = c(.3, .4, .5), copulas = c("no", "cube", "cube"),
types = c("bin", "ter", "tru"), XP = list(.3, c(.2, .4), .5), showplot = TRUE)
X = simdata$X; plotX = simdata$plotX
# Check the proportion of zeros for the binary variable.
sum(simdata$X[ , 1] == 0)
#> [1] 30
# Check the proportion of zeros and ones for the ternary variable.
sum(simdata$X[ , 2] == 0); sum(simdata$X[ , 2] == 1)
#> [1] 20
#> [1] 40
# Check the proportion of zeros for the truncated variable.
sum(simdata$X[ , 3] == 0)
#> [1] 50
```