Pytorch with examples

※ 아래 글은 Pytorch 홈페이지의 pytorch_with_examples를 번역 및 요약한 것입니다.

1. Numpy vs Pytorch

사인함수(sin)를 numpy 와 pytorch를 이용해 Linefitting 하면서 두 프레임 워크를 비교해 본다.

먼저 LineFitting을 위해서 모델링을 먼저 한다. 본 예제에서는 3차원 다항식으로 sin 함수를 Linefitting 을 시도한다.

$ y = dx^3 + cx^2 + bx + d $

2. Numpy

아래는 Numpy로 sin 함수 Linefitting 하기 위한 코드와 결과이다.

아래 그림을 보면, 3차원으로 다항식 가정을 하였고, 추정치가 실제 sin 함수와는 다소 차이를 보인다. 하지만, 계산 그래프를 통해서 계수(coefficient)를 추정해 나가는 것을 볼 수 있다.

# -*- coding: utf-8 -*-
from matplotlib import pyplot as plt
import numpy as np
import math

X = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(X)

# Randomly initialize weights
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6

for t in range(10000):
    # Forward pass: compute predicted y
    # y = a + b*x + c*x^2 + d*x^3
    y_pred = a + b * X + c * X**2 + d * X**3 #sin 함수에 fitting 하는 3차원 함수
    
    # Compute and input loss
    loss = np.square(y_pred - y).sum() #Euclidean 거리를 기준으로 최소화하는 것이 목적
    
    if t % 100 == 99:
        print(t, loss)

    # ----Backprop to compute gradients of a, b, c, d with respect to loss----#
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * X).sum()
    grad_c = (grad_y_pred * X ** 2).sum()
    grad_d = (grad_y_pred * X ** 3).sum()
    #-------------------------------------------------------------------------#

    # Update weights
    '''
    a, b, c, d를 랜덤으로 놓고 최적화 시켜 간다.
    '''
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')

xVal = np.linspace(-10, 10, 20)
yEst = np.polyval([d, c, b, a], xVal) # d * x^3 + c * x^2 + b*x +c
print(xVal)
print(yEst)
plt.plot(xVal, yEst, 'bo')
plt.show()

2-1 Numpy의 단점

하지만, numpy는 다음과 같은 단점이 있다.

- 수동으로(manually) 역전파 함수를 구하여 계수를 업데이트 해줘야 한다.

- GPU 가속화의 이점을 활용할 수 없다.

따라서, Numpy는 선형 연산에는 유용하지만, Neural Network 연산에 활용하기 에는 다소 부족하다.

3. Pytorch

pytorch는 numpy와 동일하게 다차원 배열 연산을 지원한다. 다만, 넘파일 배열과 동일한 역할을 하는 텐서(Tensor)를 도입하여 계산그래프와 미분을 계속 추적해 나갈 수 있다(keep track of a computational graph and gradients).

pytorch의 autograd 패키지는 역전파를 계산하기 위한 자동 미분을 지원한다.

autograd 패키지를 이용하면, (numpy에서는 수동으로 구해줘야 했던) 계산 그래프(Computational Graph)를 정의하게 된다.

특히 아래와 같이 정의되게 된다.

노드(Node or Vertices) --> 텐서가 된다.

에지(Edge) --> Input 텐서에서 Output 텐서를 출력하는 함수가 된다.

x.requires_grad=True를 하게 되면, x.grad는 미분 값을 가지고 있는 다른 텐서가 된다.

# -*- coding: utf-8 -*-
from matplotlib import pyplot as plt
import numpy as np
import torch
import math

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0")  # Uncomment this to run on GPU

# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
X = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(X)

# Create random Tensors for weights. For a third order polynomial, we need
# 4 weights: y = a + b x + c x^2 + d x^3
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6

for t in range(10000):
    # Forward pass: compute predicted y
    # y = a + b*x + c*x^2 + d*x^3
    y_pred = a + b * X + c * X ** 2 + d * X ** 3  # sin 함수에 fitting 하는 3차원 함수

    # Compute and input loss
    loss = (y_pred - y).pow(2).sum()  # Euclidean 거리를 기준으로 최소화하는 것이 목적

    if t % 100 == 99:
        print(t, loss.item())

    # ----Backprop to compute gradients of a, b, c, d with respect to loss----#
    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    '''
    loss.backward()를 하게 되면, requires_grad=True를 한 텐서에 한해 미분을 계산하게 된다.
    즉, a.grad, b.grad, c.grad, d.grad는 a, b, c, d 각 텐서의 미분 값을 가지고 있는 다른 텐서이다.
    텐서의 미분 값은 .grad 속성에 누적된다.
    다르게 표현하면, 스칼라 값에 대한 출력 텐서의 기울기를 수신하고, 
    동일한 스칼라 값에 대해 입력 텐서의 기울기를 계산한다.
    '''
    loss.backward()
    # -------------------------------------------------------------------------#

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    '''
    a, b, c, d를 랜덤으로 놓고 최적화 시켜 간다.
    requires_grad=True되어 학습 가능한 매개변수를 갖는 변수들 이지만,
    변화도(gradient)는 필요 없다.
    '''
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')

'''
텐서가 기록을 추적하는 것을 중단하게 하려면, .detach()를 호출하여 연산 기록으로 분리한다.
그리고 텐서를 numpy 배열로 변환하는 작업이 필요하다.
'''
a = a.detach().numpy()
b = b.detach().numpy()
c = c.detach().numpy()
d = d.detach().numpy()

xVal = np.linspace(-10, 10, 2000)
yEst = np.polyval([d, c, b, a], xVal)  # d * x^3 + c * x^2 + b*x +c
print(xVal)
print(yEst)
plt.plot(xVal, yEst, 'bo')
plt.show()

그래프를 살펴보면, 앞선 numpy와 결과가 동일하다.

3-1 PyTorch: nn & optim

▶계산 그래프와 autograd는 자동으로 미분 계산을 취하기 위해 매우 편리하지만, 대규모의 신경망을 갖출 경우 autograd 만으로도 부족하다.

계산을 계층으로 배열하는 것을 생각할 수 있으며, 그 중 일부는 학습 중에 최적화가 가능한 학습 가능한 매개변수(Learnable Parameters)를 가지고 있다.

(이 문장만 가지고는 위의 방식하고 정확한 차이가 안 느껴진다)

Pytorch nn 모듈로 Neural Network를 구성할 수 있다. 모듈은 입력 텐서를 받아, 출력 텐서를 계산한다.

또한, 학습 가능한 파라미터를 포함한 텐서 등 내부 상태를 가진다. 또한, nn 모듈로 학습에 필요한 Loss 함수를 정의할 수 있다.

▶지금까지는 학습가능한 파라미터를 가진 텐서를 torch.no_grad()를 이용하여 수동으로 가중치를 업데이트 하였다. 이는 SGD와 같은 단순 최적화 알고리즘에는 큰 부하가 아니지만, 실제로는 AdaGrad, RMPProp, Adam과 같은 좀더 정교한 최적화 알고리즘을 사용한다.

# -*- coding: utf-8 -*-
from matplotlib import pyplot as plt
import numpy as np
import torch
import math
OPTIMIZER = 1

# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# In the above code, x.unsqueeze(-1) has shape (2000, 1), and p has shape
# (3,), for this case, broadcasting semantics will apply to obtain a tensor
# of shape (2000, 3)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. The Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# The Flatten layer flatens the output of the linear layer to a 1D tensor,
# to match the shape of `y`.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
'''
loss도 단순 편차의 제곱이 아니다.
'''
loss_fn = torch.nn.MSELoss(reduction='sum')

if OPTIMIZER:
    # Use the optim package to define an Optimizer that will update the weights of
    # the model for us. Here we will use RMSprop; the optim package contains many other
    # optimization algorithms. The first argument to the RMSprop constructor tells the
    # optimizer which Tensors it should update.
    learning_rate = 1e-3
    optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
else:
    learning_rate = 1e-6

for t in range(2000):

    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(xx)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    if OPTIMIZER:
        optimizer.zero_grad()
    else:
        model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    '''
    model.parameters() 안에 학습 시킬 파라미터(리스트 형태로) 들이 들어 있다. 굉장히 추상적.
    '''
    if OPTIMIZER:
        optimizer.step()
    else:
        with torch.no_grad():
            for param in model.parameters(): # param은 텐서이다.
                param -= learning_rate * param.grad

# You can access the first layer of `model` like accessing the first item of a list
linear_layer = model[0]

'''
torch.nn.Linear(3, 1) 안에
linear_layer.bias
linear_layer.weight[:, 0]
linear_layer.weight[:, 1]
linear_layer.weight[:, 2] 가 있다.
'''
# For linear layer, its parameters are stored as `weight` and `bias`.
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x +'
      f' {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

a = linear_layer.bias.item()
b = linear_layer.weight[:, 0].item()
c = linear_layer.weight[:, 1].item()
d = linear_layer.weight[:, 2].item()

xVal = np.linspace(-10, 10, 2000)
yEst = np.polyval([d, c, b, a], xVal)  # d * x^3 + c * x^2 + b*x +c
print(xVal)
print(yEst)
plt.plot(xVal, yEst, 'bo')
plt.show()

3-2. Pytorch: nn_custom

가장 흔히 쓰이는 방법은 아래와 같다. 모델을 하나의 클래스 형태로 독립 시킨다. 학습 부에서는 모델 객체를 활용한다.

훨씬 코드가 간결해 진다.

# -*- coding: utf-8 -*-
import torch
import math

class Polynomial3(torch.nn.Module):
    def __init__(self):
        super.__init__()
        self.a = torch.nn.Parameter(torch.randn())
        self.b = torch.nn.Parameter(torch.randn())
        self.c = torch.nn.Parameter(torch.randn())
        self.d = torch.nn.Parameter(torch.randn())

    def forward(self, x):
        return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3

    def string(self):
        """
        Just like any class in Python, you can also define custom method on PyTorch modules
        """
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'

# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

model = Polynomial3()

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the nn.Linear
# module which is members of the model.
loss_fn = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)

for t in range(2000):
    y_pred = model(x)

    # Compute and print loss
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

사견 :

- 3-1 부터는 다소 pytorch에 dependent 한 부분이 많지 않나 생각한다. 프레임워크에 의존하다 보니 (네트워크, 로스 함수, 옵티마이저) 개념이 추상화되어 있다. CUDA와 같이 Low-Level 언어를 활용할 수 있고, 계산 그래프를 계산할 수 있다면, 3번 예제 정도는 스스로 프레임워크에 의존하지 않고 짤수 있을 것이다.

- '성능을 최적으로 끌어 올릴 것인가' 아니면, '개념을 추상화 시켜 간결하게 만들 것인가'의 딜레마가 존재한다.

'데이터 과학 > 딥러닝 FrameWork' 카테고리의 다른 글

RNN & LSTM 설명 및 구현(pytorch) (0)	2021.03.29
모델 앙상블(ensemble) 하기 (0)	2021.02.26
Trouble Shooting (0)	2021.01.31
Torch 데이터셋 & 데이터 로더 + Transforms (0)	2021.01.30
Torch 연산 (0)	2021.01.29

Donghoon Note

Pytorch with examples

1. Numpy vs Pytorch

2. Numpy

2-1 Numpy의 단점

3. Pytorch

3-1 PyTorch: nn & optim

3-2. Pytorch: nn_custom

'데이터 과학 > 딥러닝 FrameWork' 카테고리의 다른 글

티스토리툴바

Pytorch with examples

1. Numpy vs Pytorch

2. Numpy

2-1 Numpy의 단점

3. Pytorch

3-1 PyTorch: nn & optim

3-2. Pytorch: nn_custom

'데이터 과학 > 딥러닝 FrameWork' 카테고리의 다른 글

'데이터 과학/딥러닝 FrameWork' Related Articles

티스토리툴바