Name | Description | Type | Package | Framework |
Binarizer | Binarize a column of continuous features given a threshold. | Class | org.apache.spark.ml.feature | Apache Spark |
Bucketizer | Bucketizer maps a column of continuous features to a column of feature buckets. | Class | org.apache.spark.ml.feature | Apache Spark |
ChiSqSelector | Chi-Squared feature selection, which selects categorical features to use for predicting aSee Also:Serialized Form | Class | org.apache.spark.ml.feature | Apache Spark |
ChiSqSelectorModel | Class | org.apache.spark.ml.feature | Apache Spark | |
ColumnPruner | Utility transformer for removing temporary columns from a DataFrame. | Class | org.apache.spark.ml.feature | Apache Spark |
CountVectorizer | Extracts a vocabulary from document collections and generates a CountVectorizerModel. | Class | org.apache.spark.ml.feature | Apache Spark |
CountVectorizerModel | Converts a text document to a sparse vector of token counts. | Class | org.apache.spark.ml.feature | Apache Spark |
DCT | A feature transformer that takes the 1D discrete cosine transform of a real vector. | Class | org.apache.spark.ml.feature | Apache Spark |
ElementwiseProduct | Outputs the Hadamard product (i. | Class | org.apache.spark.ml.feature | Apache Spark |
HashingTF | Maps a sequence of terms to their term frequencies using the hashing trick. | Class | org.apache.spark.ml.feature | Apache Spark |
IDF | Compute the Inverse Document Frequency (IDF) given a collection of documents. | Class | org.apache.spark.ml.feature | Apache Spark |
IDFModel | Class | org.apache.spark.ml.feature | Apache Spark | |
IndexToString | Class | org.apache.spark.ml.feature | Apache Spark | |
Interaction | Implements the feature interaction transform. | Class | org.apache.spark.ml.feature | Apache Spark |
MinMaxScaler | Rescale each feature individually to a common range [min, max] linearly using column summary statistics, which is also known as min-max normalization or Rescaling. | Class | org.apache.spark.ml.feature | Apache Spark |
MinMaxScalerModel | Class | org.apache.spark.ml.feature | Apache Spark | |
NGram | A feature transformer that converts the input array of strings into an array of n-grams. | Class | org.apache.spark.ml.feature | Apache Spark |
Normalizer | Normalize a vector to have unit norm using the given p-norm. | Class | org.apache.spark.ml.feature | Apache Spark |
OneHotEncoder | A one-hot encoder that maps a column of category indices to a column of binary vectors, with at most a single one-value per row that indicates the input category index. | Class | org.apache.spark.ml.feature | Apache Spark |
PCA | PCA trains a model to project vectors to a low-dimensional space using PCA. | Class | org.apache.spark.ml.feature | Apache Spark |
PCAModel | Class | org.apache.spark.ml.feature | Apache Spark | |
PolynomialExpansion | Perform feature expansion in a polynomial space. | Class | org.apache.spark.ml.feature | Apache Spark |
QuantileDiscretizer | QuantileDiscretizer takes a column with continuous features and outputs a column with binned categorical features. | Class | org.apache.spark.ml.feature | Apache Spark |
RegexTokenizer | A regex based tokenizer that extracts tokens either by using the provided regex pattern to split the text (default) or repeatedly matching the regex (if gaps is false). | Class | org.apache.spark.ml.feature | Apache Spark |
RFormula | Implements the transforms required for fitting a dataset against an R model formula. | Class | org.apache.spark.ml.feature | Apache Spark |
RFormulaModel | A fitted RFormula. | Class | org.apache.spark.ml.feature | Apache Spark |
SQLTransformer | Implements the transformations which are defined by SQL statement. | Class | org.apache.spark.ml.feature | Apache Spark |
StandardScaler | Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set. | Class | org.apache.spark.ml.feature | Apache Spark |
StandardScalerModel | Class | org.apache.spark.ml.feature | Apache Spark | |
StopWordsRemover | A feature transformer that filters out stop words from input. | Class | org.apache.spark.ml.feature | Apache Spark |
StringIndexer | A label indexer that maps a string column of labels to an ML column of label indices. | Class | org.apache.spark.ml.feature | Apache Spark |
StringIndexerModel | Model fitted by StringIndexer. | Class | org.apache.spark.ml.feature | Apache Spark |
Tokenizer | A tokenizer that converts the input string to lowercase and then splits it by white spaces. | Class | org.apache.spark.ml.feature | Apache Spark |
VectorAssembler | A feature transformer that merges multiple columns into a vector column. | Class | org.apache.spark.ml.feature | Apache Spark |
VectorAttributeRewriter | Utility transformer that rewrites Vector attribute names via prefix replacement. | Class | org.apache.spark.ml.feature | Apache Spark |
VectorIndexer | Class for indexing categorical feature columns in a dataset of Vector. | Class | org.apache.spark.ml.feature | Apache Spark |
VectorIndexerModel | Transform categorical features to use 0-based indices instead of their original values. | Class | org.apache.spark.ml.feature | Apache Spark |
VectorSlicer | This class takes a feature vector and outputs a new feature vector with a subarray of the The subset of features can be specified with either indices (setIndices()) | Class | org.apache.spark.ml.feature | Apache Spark |
Word2Vec | Word2Vec trains a model of Map(String, Vector), i. | Class | org.apache.spark.ml.feature | Apache Spark |
Word2VecModel | Model fitted by Word2Vec. | Class | org.apache.spark.ml.feature | Apache Spark |