Skip to content

Adding a new PDI Step

Diogo Silva edited this page Sep 14, 2020 · 8 revisions

Adding and registering a new Pentaho Data Integration Step is a straightforward process which has been streamlined by the design of the BICenter project. A list of all available PDI Steps can be found in this link.

After selecting the Step to be added, the following is the process required to add said step into the system.

  1. Register the new step on the public > editor > diagrameditor.xml file.
  2. Add a new object to conf > configuration.json under the subsection pertaining to the new component's type with the desired component properties.
  3. If necessary, add extra functionality to the submitClick method on the app > assets > javascripts > step > stepController.js or the applyChanges method on app > assets > javascripts > services > step.js.
  4. Create a new class on app > diSdk > step > parser with the appropriate name.
  5. If the component requires pre-processing or extra logic you can alter the decodeStep method on app > diSdk > step > AbstractStep.java

Example

The following is an example of how to add the CSVFileInput component. For more information about this component's functioning you can check out this link.

  1. Firstly, we have to add the following line to public > editor > diagrameditor.xml
<add as="CSV File Input" template="CSVInput" icon="/assets/images/editor/rectangle.gif"/>
  1. Next we'll have to register the component's properties on the conf > configuration.json file. As the CSV File Input step is an Input type component, we'll be registering it under the Input Components. We'll be naming this object the same name we used on the template field on the prior step. Note that the shortName field should have a name that goes in accordance with the naming specified by the Pentaho Kettle library (in our specific case, this can be seen in this link). Under the componentProperties array we can specify the multiple inputs and fields of our component.
{
          "name": "CSVInput",
          "description": "CSV File Input",
          "shortName": "csvInput",
          "componentProperties": [
            {
              "name": "Step Name",
              "shortName": "stepName",
              "type": "input"
            },
            {
              "name": "File Name",
              "shortName": "fileName",
              "type": "fileinput"
            },
            {
              "name": "Delimiter",
              "shortName": "delimiter",
              "type": "input"
            },
            {
              "name": "Enclosure",
              "shortName": "enclosure",
              "type": "input"
            },
            {
              "name": "NIO Buffer Size",
              "shortName": "bufferSize",
              "type": "number"
            },
            {
              "name": "File Encoding",
              "shortName": "encoding",
              "type": "select",
              "componentMetadatas": [
                {
                  "value": "UTF-8",
                  "name": "UTF-8"
                },
                {
                  "value": "ANSI",
                  "name": "ANSI"
                }
              ]
            },
            {
              "name": "Lazy Conversion?",
              "shortName": "lazyConversionActive",
              "type": "checkbox"
            }
          ]
        }
}
  1. As BICenter is already prepared to receive files and all of our CSVInput's parameters, we can skip this step.

  2. Now we have to create a new class on the app > diSdk > step > parser directory. This can simply be done by copying any of the other classes already present in the folder. Don't worry about the lack of logic in this class, as this serves as a mere extension of the AbstractStep class, which itself contains all logic needed to communicate with the PDI, used in order to allow our component to be detected and processed.

package diSdk.step.parser;

import diSdk.step.AbstractStep;
import models.Step;
import org.pentaho.di.trans.step.StepMetaInterface;
import org.w3c.dom.Element;

public class CSVInput extends AbstractStep {
    @Override
    public void decode(StepMetaInterface stepMetaInterface, Step step) throws Exception {

    }

    @Override
    public Element encode(StepMetaInterface stepMetaInterface) throws Exception {
        return null;
    }
}
  1. As we want our system to automatically detect the CSV's fields automatically without the users having to manually introduce them themselves, we have to add some logic to the decodeStep method on app > diSdk > step > AbstractStep.java which will do an initial read of the file in order to extrapolate it's columns' names. This can be done by creating a new method on this class and adding it to the decodeStep method, or by injecting the logic directly into the function.
// If dealing with CSVFileInput get the input fields and define them
                        if (shortName.equals("InputFields")) {
                            if (fileName == null) {
                                Optional<StepProperty> fileNameStepProperty = stepProperties.stream()
                                        .filter(stepProperty -> stepProperty.getComponentProperty().getShortName().equalsIgnoreCase("Filename"))
                                        .findFirst();

                                if (!fileNameStepProperty.isPresent())
                                    continue;
                                fileName = fileNameStepProperty.get().getValue();
                            }

                            if (delimiter == null) {
                                Optional<StepProperty> delimiterStepProperty = stepProperties.stream()
                                        .filter(stepProperty -> stepProperty.getComponentProperty().getShortName().equalsIgnoreCase("Delimiter"))
                                        .findFirst();

                                if (!delimiterStepProperty.isPresent())
                                    continue;
                                delimiter = delimiterStepProperty.get().getValue();
                            }

                            try {
                                BufferedReader br = new BufferedReader(new FileReader(fileName));
                                String header = br.readLine();

                                String[] fields = new String[0];
                                if (header != null) {
                                    fields = header.split(delimiter);
                                }

                                TextFileInputField[] value = new TextFileInputField[fields.length];
                                for (int i = 0; i < fields.length; i++) {
                                    String field = fields[i];
                                    value[i] = new TextFileInputField();
                                    value[i].setName(field);
                                    System.out.println(field);
                                }

                                // Invoke the current method with the StepProperty value.
                                invokeMethod(stepMetaInterface, method, value, databases);

                            } catch (FileNotFoundException e) {

                            }
Clone this wiki locally