HadoopFileLinesReader

HadoopFileLinesReader is a Scala Iterator of Apache Hadoop’s org.apache.hadoop.io.Text.

HadoopFileLinesReader is created to access datasets in the following data sources:

  • SimpleTextSource

  • LibSVMFileFormat

  • TextInputCSVDataSource

  • TextInputJsonDataSource

  • TextFileFormat

HadoopFileLinesReader uses the internal iterator that handles accessing files using Hadoop’s FileSystem API.

Creating HadoopFileLinesReader Instance

HadoopFileLinesReader takes the following when created:

  • PartitionedFile

  • Hadoop’s Configuration

iterator Internal Property

iterator: RecordReaderIterator[Text]

When created, HadoopFileLinesReader creates an internal iterator that uses Hadoop’s org.apache.hadoop.mapreduce.lib.input.FileSplit with Hadoop’s org.apache.hadoop.fs.Path and file.

iterator creates Hadoop’s TaskAttemptID, TaskAttemptContextImpl and LineRecordReader.

iterator initializes LineRecordReader and passes it on to RecordReaderIterator.

Note
iterator is used for Iterator-specific methods, i.e. hasNext, next and close.

results matching ""

    No results matching ""