こすたろーんエンジニアの試行錯誤部屋

作成物の備忘録を書いていきますー

【mlops】I try to use kedro

スポンサーリンク

I've tried a few mlops related modules so far.
This time, I will try to install kedro.

contents

スポンサーリンク

abstract

How to install kedro

1.requirement

Jetson Xavier NX
ubuntu18.04
Python 3.6.9
docker

2. install

You can install the following with the command.

pip install kedro

The installation can be verified with the following command.

kedro info

 _            _
| | _____  __| |_ __ ___
| |/ / _ \/ _` | '__/ _ \
|   <  __/ (_| | | | (_) |
|_|\_\___|\__,_|_|  \___/
v0.18.14

3. coding

3.1 Creation of node

Write a function in python and register func,input,output in node
This sample code is as follows
func:return_greeting
|- input:none
|- output:my_salutation

func:join_statements
|- input:greeting
|- output:my_message

from kedro.pipeline import node

# Prepare first node
def return_greeting():
    return "Hello"

# Prepare second node
def join_statements(greeting):
    return f"{greeting} Kedro!"

return_greeting_node = node(func=return_greeting, inputs=None, outputs="my_salutation")

join_statements_node = node(
    join_statements, inputs="my_salutation", outputs="my_message"
)

3.2 Creating a pipline

pipline describes node dependencies and execution order

from kedro.pipeline import pipeline

# Assemble nodes into a pipeline
greeting_pipeline = pipeline([return_greeting_node, join_statements_node])

3.3 DataCatalog

DataCatalog is a module that supports various data formats.
This time, we simply define it as a box for my_salutation variables.

from kedro.io import DataCatalog, MemoryDataSet

# Prepare a data catalog
data_catalog = DataCatalog({"my_salutation": MemoryDataSet()})

3.4 runner

Finally, create a runner to execute pipeline

# Create a runner to run the pipeline
runner = SequentialRunner()

# Run the pipeline
print(runner.run(greeting_pipeline, data_catalog))

4.run

4.1 Completed code

"""Contents of hello_kedro.py"""
from kedro.io import DataCatalog, MemoryDataSet
from kedro.pipeline import node, pipeline
from kedro.runner import SequentialRunner

# Prepare a data catalog
data_catalog = DataCatalog({"my_salutation": MemoryDataSet()})

# Prepare first node
def return_greeting():
    return "Hello"


return_greeting_node = node(return_greeting, inputs=None, outputs="my_salutation")

# Prepare second node
def join_statements(greeting):
    return f"{greeting} Kedro!"


join_statements_node = node(
    join_statements, inputs="my_salutation", outputs="my_message"
)

# Assemble nodes into a pipeline
greeting_pipeline = pipeline([return_greeting_node, join_statements_node])

# Create a runner to run the pipeline
runner = SequentialRunner()

# Run the pipeline
print(runner.run(greeting_pipeline, data_catalog))

4.2 run command

python hello_kedro.py

The execution result will be as follows

INFO     Running node: return_greeting(None) -> [my_salutation] 
INFO     Saving data to 'my_salutation' (MemoryDataset)... 
INFO     Completed 1 out of 2 tasks 
INFO     Loading data from 'my_salutation' (MemoryDataset)... 
INFO     Running node: join_statements([my_salutation]) -> [my_message] 
INFO     Saving data to 'my_message' (MemoryDataset)... 
INFO     Completed 2 out of 2 tasks 
INFO     Pipeline execution completed successfully.  
INFO     Loading data from 'my_message' (MemoryDataset)... 
{'my_message': 'Hello Kedro!'}

スポンサーリンク

3.refarence

docs.kedro.org