Archives | LOVIT x DATA SCIENCE

LOVIT x DATA SCIENCE

Home
About
Archives
Tags
Sitemap

Good! 84 posts in total. Keep on posting.

2019

Self Organizing Map. Part 1. Implementing SOM from scratch

12-02

Seaborn vs Bokeh. Part 2. Bokeh tutorial

11-22

Seaborn vs Bokeh. Part 1. Seaborn tutorial

11-22

Document vectors 와 word vectors 를 함께 시각화 하기 (Doc2vec 공간의 이해)

06-18

NMF, k-means 를 이용한 토픽 모델링과 NMF, k-means + PyLDAvis 시각화

06-10

KR-WordRank 를 이용한 핵심 문장 추출과 ROUGE 를 이용한 요약문 성능 평가

05-01

TextRank 를 이용한 키워드 추출과 핵심 문장 추출 (구현과 실험)

04-30

Reviews of sequential labeling algorithms (Sparse representation model)

04-07

Attention mechanism in NLP. From seq2seq + attention to BERT

03-17

Word2Vec 과 Logistic Regression 을 이용한 (Semi-supervised) Named Entity Recognition

02-16

2019

한국어 텍스트마이닝 실습용 데이터셋과 실습 코드 (lovit textmining dataset & python ml4nlp)

02-16

k-means Ensemble 구현과 학습 시 주의할 점

02-11

Bokeh 와 Flask 를 이용한 AWS S3 access log 모니터링 앱 만들기

02-08

AWS CLI (Command Line Interface) 를 이용하여 S3 버킷 다루기 (파일 업로드, 폴더 동기화) 및 AWS IAM 등록

01-30

(Spark) 0. Ubuntu 에 Spark 설치, IPython Notebook 의 외부접속 설정, PySpark 와 Notebook 연동

01-29

AWS S3 에 데이터셋 공유하기 (Bucket 만들고 파일 업로드 하기)

01-25

말뭉치를 이용한 한국어 용언 분석기 (Korean Lemmatizer)

01-22

praw 를 이용한 Reddit scrapping 과 아카이빙이 된 이전 Reddit 가져오기

01-16

Python dill 로 class definition 까지 binary 로 저장하기

01-15

Github 으로 데이터 공유 API 만들기

01-12

2018

PyTorch 에서 L1 regularity 부여하기

12-05

(Gensim) Word2Vec 의 최소 빈도수 설정

12-05

Hidden Markov Model 기반 품사 판별기의 decode 함수

10-23

n-gram extraction

10-23

FastText, Word representation using subword

10-22

Spherical k-means for document clustering

10-16

Embedding for Word Visualization (LLE, ISOMAP, MDS, t-SNE)

09-28

t-Stochastic Neighbor Embedding (t-SNE) 와 perplexity

09-28

pyLDAvis 를 이용한 k-means 학습 결과 시각화하기

09-27

pyLDAvis 를 이용한 Latent Dirichlet Allocation 시각화하기

09-27

2018

(Review) Incorporating Global Information into Supervised Learning for Chinese Word Segmentation

09-25

Conditional Random Field (CRF) 기반 품사 판별기의 원리와 HMM 기반 품사 판별기와의 차이점

09-13

Hidden Markov Model (HMM) 기반 품사 판별기의 원리와 문제점

09-11

Network based Nearest Neighbor Indexer

09-10

GloVe, word representation

09-05

Inverted index 를 이용한 빠른 Levenshtein (edit) distance 탐색

09-04

Levenshtein (edit) distance 를 이용한 한국어 단어의 형태적 유사성

08-28

Ford algorithm 을 이용한 품사 판별, 그리고 Hidden Markov Model (HMM) 과의 관계

08-21

Ford algorithm 을 이용한 최단 경로 탐색

08-21

Github 으로 텍스트 문서 버전 관리하기

08-17

2018

Java in Python, Komoran 3 를 Python package 로 만들기

07-06

Conditional Random Field based Named Entity Recognition

06-22

한국어 용언의 활용 함수 (Korean conjugation)

06-11

한국어 용언의 원형 복원 (Korean lemmatization)

06-07

Cherry picking distort distribution.

05-26

Unsupervised noun extraction (3). Usage of extractor and tokenizer

05-09

Unsupervised noun extraction (2). Improving accuracy and recall

05-08

Unsupervised noun extraction. L-R structure

05-07

Tree traversal of trained decision tree (scikit-learn)

04-30

Decision trees are not appropriate for text classifications.

04-30

2018

soydata. 복잡한 인공 데이터 생성을 위한 함수들

04-27

Plotly 를 이용한 3D scatter plot

04-26

soyspacing. Heuristic Korean Space Correction, A safer space corrector.

04-25

Conditional Random Field based Korean Space Correction

04-24

From Softmax Regression to Conditional Random Field for Sequential Labeling

04-24

(Review) Neural Word Embedding as Implicit Matrix Factorization (Levy & Goldberg, 2014 NIPS)

04-22

Implementing PMI (Practice handling matrix of numpy & scipy)

04-22

(Review) From frequency to meaning, Vector space models of semantics (Turney & Pantel, 2010)

04-18

Implementing PageRank. Python dict vs numpy

04-17

Word cloud in Python

04-17

2018

KR-WordRank, 토크나이저를 이용하지 않는 한국어 키워드 추출기

04-16

Graph ranking algorithm. PageRank and HITS

04-16

Term proportion ratio base Keyword extraction

04-12

Unsupervised tokenizers in soynlp project

04-09

Uncertanty to word boundary; Accessor Variety & Branching Entropy

04-09

Cohesion score + L-Tokenizer. 띄어쓰기가 잘 되어있는 한국어 문서를 위한 unsupervised tokenizer

04-09

Scipy sparse matrix handling

04-09

띄어쓰기가 되어있지 않은 한국어를 위한 토크나이저 만들기 (Max Score Tokenizer 를 Python 으로 구현하기)

04-09

Scikit-learn Logistic Regression fails for finding optima?

04-06

Komoran, 코모란 형태소 분석기 사용 방법과 사용자 사전 추가 (Java, Python)

04-06

2018

Word2Vec understanding, Space odyssey of word embedding (1)

04-05

Word Piece Model (a.k.a sentencepiece)

04-02

Left-side substring tokenizer, the simplest tokenizer.

04-02

Part of speech tagging, Tokenization, and Out of vocabulary problem

04-01

Python plotting kit Bokeh

03-31

Random Projection and Locality Sensitive Hashing

03-28

Word / Document embedding (Word2Vec / Doc2Vec)

03-26

From text to term frequency matrix (KoNLPy)

03-26

Logistic regression with L1, L2 regularization and keyword extraction

03-24

Logitsic regression and Softmax regression for document classification

03-22

2018

Cluster labeling for text data

03-21

Carblog. Problem description

03-20

k-means initial points 선택 방법

03-19

2017

Personalized PageRank and its application, movie recommender

04-17

Hyunjoong Kim (lovit)

© 2025 Hyunjoong Kim (lovit)

Powered by Jekyll

Theme - NexT.Muse