We construct a Matrix with housing data. The first column has number of bedrooms, the second has number of square feet, the third has price.
We construct a DataFrame with housing data. The first column has number of bedrooms, the second has the area in square feet, the third has price.
>
|
|
>
|
|
>
|
|
>
|
|
We can create box plots of the price for subgroups of sales defined by number of bedrooms.
>
|
|
| (1) |
>
|
|
>
|
|
>
|
|
Looking at this plot, we see that there is one house that is especially small and cheap, and two that are especially large and expensive, and the rest are clustered fairly closely together. Let's split that cluster into two subgroups according to their area. There is a small gap around 950 square feet.
>
|
|
| (2) |
>
|
|
| (3) |
>
|
|
We can compute the means of the columns in an individual data set by using the Mean command. To apply it to the list of DataFrame objects returned by SplitByColumn, we can use the elementwise version of Mean, obtained by appending a tilde.
| (4) |
| (5) |
The same thing can be done in one step with the Aggregate method of the DataFrame object. Since this command is part of the DataFrame object, it does not accept Matrices.
>
|
|
| (6) |
We can also split the data into four roughly equally large groups by price. The output is lists of vectors this time.
>
|
|
| (7) |
Let's examine the areas of these houses.
>
|
|
| (8) |
>
|
|
There are undefined values in the following Matrix. If we do not supply a bounds argument, then the undefined values in the third column are grouped into a separate bin.
>
|
|
| (9) |
>
|
|
| (10) |
If we do supply a bounds argument, then the result is not well-defined. In this case, the row with value 9 in the key column disappears, and undefined values are placed in the first half of data.
>
|
|
| (11) |
With the ignore option, the undefined values are removed from the key column. Note that the undefined value in the second column remains untouched.
>
|
|
| (12) |