|
Calling Sequence
|
|
FillMissing(df, fill, missopt, methopt)
DropMissing(df, missopt)
|
|
Parameters
|
|
df
|
-
|
a DataFrame object
|
fill
|
-
|
(optional) a value to replace missing values
|
missopt
|
-
|
(optional) equation of the form missing = m1, where m1 can be any expression
|
methopt
|
-
|
(optional) equation of the form method = m2, where m2 can be forward or "forward" or backward or "backward"
|
|
|
|
|
Description
|
|
•
|
The FillMissing command creates a copy of a DataFrame where missing values are replaced with a value of your choice.
|
•
|
The DropMissing command creates a copy of a DataFrame where columns that contain missing values are removed.
|
•
|
For both commands, by default, a missing value is determined by the data type of the column:
|
–
|
For floating point data types, the missing value is the appropriate version of undefined. For example, columns with data type float[8] use Float(undefined).
|
–
|
For hardware integer data types integer[], where is 1, 2, 4, or 8, the missing value is 0. (Such columns cannot store non-integer values, so one cannot use a version of undefined here.)
|
–
|
For string columns, the empty string is the missing value.
|
–
|
For columns of type truefalseFAIL and boolean_constant, the missing value is FAIL.
|
–
|
For all other data types, the missing value is undefined.
|
•
|
In order to use a different value as the missing value, you can use the option missing = m1. If you supply this option, then any occurrence of m1 will be considered missing.
|
•
|
For the FillMissing command, by default, the value used to replace a missing value depends on the data type of the column it occurs in:
|
–
|
For all numeric data types, including floating point and integer, the default value is 0.
|
–
|
For the data type string, the default value is the empty string.
|
–
|
For the data types truefalse, truefalseFAIL, boolean, and boolean_constant, the default value is false.
|
•
|
The fill argument, if it is specified, overrides the value used to replace a missing value.
|
•
|
If the method option is used, then any missing values will be replaced either by the last non-missing value before it (with method = forward or method = "forward"), or the first non-missing value after it (with method = backward or method = "backward"). If no such value is available (for example, if the first value is missing and method = forward is specified), then the value used is determined in the same way as if the method option were not specified.
|
•
|
For columns of hardware integer type and string columns, the default missing value and the default fill value are the same. Using the FillMissing command on such columns has no effect, unless one or more of the fill, missopt, and methopt arguments are specified.
|
|
|
Examples
|
|
>
|
|
| (1) |
Column has declared type anything, and has type integer[4]. This means that, for a value in column to be considered missing, it would have to be undefined; in , the value considered missing is 0. Consequently, the DropMissing command will remove column , but not column . Column also contains the default missing value for its data type, float[8], and consequently it is also removed.
With FillMissing, one can only see a change in column by default.
If we specify a missing value manually, for example, 8, then DropMissing removes columns containing that exact value. This applies to columns and , but not to , which contains 8. but not 8.
>
|
|
| (4) |
For FillMissing, we can specify the value to be used for replacing missing values. This is the fill argument. In the following example, we specify the value 6. This is stored in the last entries of columns and ; in column , it is automatically changed to the floating point value 6., because of the data type of that column.
If we specify the option method = backward, then missing values are replaced with later values.
>
|
|
| (6) |
|
|
Compatibility
|
|
•
|
The DataFrame/FillMissing and DataFrame/DropMissing commands were introduced in Maple 2016.
|
|
|
|