> For the complete documentation index, see [llms.txt](https://blessbingo.gitbook.io/garnet/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://blessbingo.gitbook.io/garnet/tui-jian-suan-fa/matrix-factorization/lightfm/leng-qi-dong-tui-jian.md).

# 冷启动推荐

## 冷启动

冷启动就需要`user`或者`item`的特征来配合了, 仅使用协同过滤的FM模型是无法做到冷启动的.

我们选择[StackExchange dataset](https://archive.org/details/stackexchange)作为数据集, 这个数据集除了行为交互数据, 还有**item feature**, 因此可以做关于`item`的冷启动.

```python
import numpy as np
from lightfm.datasets import fetch_stackexchange

data = fetch_stackexchange('crossvalidated',
                           test_set_fraction=0.1,
                           indicator_features=False,
                           tag_features=True)
data
```

```
{'train': <3221x72360 sparse matrix of type '<class 'numpy.float32'>'
     with 57830 stored elements in COOrdinate format>,
 'test': <3221x72360 sparse matrix of type '<class 'numpy.float32'>'
     with 4307 stored elements in COOrdinate format>,
 'item_features': <72360x1246 sparse matrix of type '<class 'numpy.float32'>'
     with 198963 stored elements in Compressed Sparse Row format>,
 'item_feature_labels': array(['bayesian', 'prior', 'elicitation', ..., 'events', 'mutlivariate',
        'sample-variance'], dtype='<U50')}
```

除了划分好的`train`和`test`, 还有一个shape为`(72360, 1246)`的`item_feature`矩阵, 共72360个item, item对应着1246个特征.

```python
train, test = data['train'], data['test']
item_features = data["item_features"]
tag_labels = data['item_feature_labels']
tag_labels
```

```
array(['bayesian', 'prior', 'elicitation', ..., 'events', 'mutlivariate',
       'sample-variance'], dtype='<U50')
```

```python
NUM_THREADS = 2
NUM_COMPONENTS = 30
NUM_EPOCHS = 3
ITEM_ALPHA = 1e-6

model = LightFM(loss='bpr', item_alpha=ITEM_ALPHA, no_components=NUM_COMPONENTS)
model = model.fit(train, item_features=item_features, epochs=NUM_EPOCHS, num_threads=NUM_THREADS)

train_auc = auc_score(model, train, item_features=item_features, num_threads=NUM_THREADS).mean()
test_auc = auc_score(model, test, item_features=item_features, num_threads=NUM_THREADS).mean()
train_auc, test_auc
```

```
(0.8141281, 0.72254986)
```

```python
user_count = np.array(train.sum(axis=0)).ravel()
np.sum(user_count == 0)
```

```
33827
```

统计得到共有33827个`item`是没有跟用户发生过交互行为的, 可以认为是新的物品上线, 为现有的用户进行推荐.

取其中一个没有发生过交互的物品计算出要被推荐的用户.

```python
t = model.predict(item_ids=np.ones(train.shape[0]) * (train.shape[1] - 1), user_ids=np.arange(train.shape[0]), item_features=item_features)
t.argsort()
```

```
array([ 145,  159,  128, ..., 1981,  375, 2349], dtype=int64)
```

上面就是应该把这个新的`item`推荐给的用户列表.

## 特征Embedding空间

训练之后也得到了每个`item feature`对应的向量, 由于在同一个空间中, 就可以通过向量之间的各种距离, 衡量两个特征的近似性.

```python
def get_similar_tags(model, tag_id):
    # Define similarity as the cosine of the angle
    # between the tag latent vectors

    # Normalize the vectors to unit length
    tag_embeddings = (model.item_embeddings.T
                      / np.linalg.norm(model.item_embeddings, axis=1)).T

    query_embedding = tag_embeddings[tag_id]
    similarity = np.dot(tag_embeddings, query_embedding)
    most_similar = np.argsort(-similarity)[1:4]

    return most_similar


for tag in (u'bayesian', u'regression', u'survival'):
    tag_id = tag_labels.tolist().index(tag)
    print('Most similar tags for %s: %s' % (tag_labels[tag_id],
                                            tag_labels[get_similar_tags(model, tag_id)]))
```

```
Most similar tags for bayesian: ['metropolis-hastings' 'gibbs' 'prior']
Most similar tags for regression: ['multiple-regression' 'goodness-of-fit' 'poisson-regression']
Most similar tags for survival: ['odds-ratio' 'kaplan-meier' 'epidemiology']
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://blessbingo.gitbook.io/garnet/tui-jian-suan-fa/matrix-factorization/lightfm/leng-qi-dong-tui-jian.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
