python movielens svd-杰瑞科技汇

Of course! Using Singular Value Decomposition (SVD) for recommendation is a classic and powerful technique, often called Latent Factor Filtering. The MovieLens dataset is the perfect dataset to learn this on.

Here’s a complete, step-by-step guide on how to build a movie recommendation system using SVD in Python.

The Core Idea of SVD for Recommendations

The goal of a recommendation system is to predict a user's rating for a movie they haven't seen yet. We can represent all user ratings as a large matrix, where:

Rows are users.
Columns are movies.
Values are the ratings (e.g., 1-5).

This matrix is very sparse because most users have only rated a tiny fraction of the available movies.

SVD helps by "decomposing" this large, sparse matrix into three smaller, dense matrices:

R ≈ U * Σ * Vᵀ

R: The original user-movie rating matrix.
U: The User Features Matrix. It shows how much each user is associated with each latent feature (e.g., "Action Lover," "Drama Enthusiast").
Σ (Sigma): The Singular Values Matrix. It's a diagonal matrix containing the "strength" or importance of each latent feature. We often use this to reduce the number of features (a process called Truncated SVD).
Vᵀ: The Movie Features Matrix. It shows how much each movie is associated with each latent feature.

By multiplying the reduced versions of U and Σ * Vᵀ, we can get a predicted rating matrix. This new matrix is dense, meaning it has a predicted rating for every user-movie pair, even the ones the user didn't originally rate.

Step-by-Step Python Implementation

We'll use the popular scikit-learn library for SVD and pandas for data manipulation.

Step 1: Setup and Installation

First, make sure you have the necessary libraries installed.

pip install pandas scikit-learn

Step 2: Load and Prepare the MovieLens Data

For this example, we'll use a small version of the MovieLens dataset (100k ratings) that is conveniently included in the scikit-learn library. This avoids the need for external downloads.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from scipy.sparse.linalg import svds
import numpy as np
# Load the MovieLens dataset (small 100k version)
from sklearn.datasets import load_mlcomp
# Note: The first time you run this, it will download the data.
# It might take a minute.
data = load_mlcomp('ml-100k', 'ml-100k')
# The data is in a 'data' attribute, which is a scipy sparse matrix
# Let's convert it to a dense pandas DataFrame for easier manipulation
raw_data = data.data
df = pd.DataFrame(raw_data.toarray(), columns=data.feature_names)
# The columns are 'user_id', 'item_id', 'rating', 'timestamp'
print("Original DataFrame Head:")
print(df.head())
print("\nOriginal DataFrame Shape:", df.shape)

Step 3: Create the User-Movie Rating Matrix

We need to pivot the DataFrame to create the user-movie matrix. The index will be user_id, the columns will be item_id, and the values will be the rating. Missing ratings will be filled with NaN.

# Create the user-movie rating matrix
R_df = df.pivot(index='user_id', columns='item_id', values='rating')
print("\nUser-Movie Rating Matrix (Head):")
print(R_df.head())
# Fill NaN with 0 for the SVD algorithm, but we'll remember the original sparsity
R = R_df.fillna(0).values

Step 4: Perform Truncated SVD

We will use scipy.sparse.linalg.svds for this. We need to choose the number of latent factors (k). This is a hyperparameter. A good starting point is around 50.

# Number of latent factors
k = 50
# Perform SVD
# We only need the first k components (Truncated SVD)
U, sigma, Vt = svds(R, k=k)
# sigma is returned as a 1D array, so we convert it to a diagonal matrix
sigma = np.diag(sigma)
print("\nShape of U:", U.shape)      # (num_users, k)
print("Shape of sigma:", sigma.shape) # (k, k)
print("Shape of Vt:", Vt.shape)      # (k, num_movies)

Step 5: Generate Predictions and Evaluate

Now, we reconstruct the rating matrix using the decomposed matrices to get our predictions.

# Reconstruct the predicted rating matrix
predicted_ratings = np.dot(np.dot(U, sigma), Vt)
# Convert the predicted ratings back to a DataFrame
predicted_ratings_df = pd.DataFrame(predicted_ratings, 
                                    index=R_df.index, 
                                    columns=R_df.columns)
print("\nPredicted Ratings Matrix (Head):")
print(predicted_ratings_df.head())

To see how well our model did, let's calculate the Root Mean Squared Error (RMSE) on a test set. We'll split the original data first.

# --- Evaluation ---
# Split the original data into training and testing sets
train_data, test_data = train_test_split(df, test_size=0.25, random_state=42)
# Create training and testing matrices
R_train = train_data.pivot(index='user_id', columns='item_id', values='rating').fillna(0).values
R_test = test_data.pivot(index='user_id', columns='item_id', values='rating').fillna(0).values
# Perform SVD on the training data
U_train, sigma_train, Vt_train = svds(R_train, k=k)
sigma_train = np.diag(sigma_train)
# Predict ratings for the training data
predicted_ratings_train = np.dot(np.dot(U_train, sigma_train), Vt_train)
# Now, we need to compare these predictions with the *actual* test ratings.
# We can only evaluate on the entries that exist in the test set.
test_user_indices = test_data['user_id'].values - 1  # Adjust for 0-based indexing
test_movie_indices = test_data['item_id'].values - 1
actual_ratings = test_data['rating'].values
# Get the predicted ratings for the same user-movie pairs in the test set
predicted_ratings_test = predicted_ratings_train[test_user_indices, test_movie_indices]
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(actual_ratings, predicted_ratings_test))
print(f"\nRMSE on Test Set: {rmse:.4f}")

Step 6: Make Recommendations for a Specific User

The most exciting part! Let's pick a user and recommend movies they haven't seen yet.

def recommend_movies(user_id, num_recommendations=5):
    """
    Recommends movies for a given user based on SVD predictions.
    """
    # Get the user's index in the matrix (adjust for 0-based indexing)
    user_row_number = user_id - 1
    # Get the user's ratings and predictions
    user_ratings = R_df.iloc[user_row_number]
    user_predictions = predicted_ratings_df.iloc[user_row_number]
    # Create a DataFrame of movies the user hasn't rated yet
    unrated_movies = user_ratings[user_ratings.isnull()]
    # Get the predicted ratings for these unrated movies
    recommendations = pd.DataFrame({
        'predicted_rating': user_predictions[unrated_movies.index]
    })
    # Sort by the highest predicted rating
    recommendations = recommendations.sort_values(by='predicted_rating', ascending=False)
    # Get the top N recommendations
    top_recommendations = recommendations.head(num_recommendations)
    # Get movie titles (assuming you have a movies file)
    # For simplicity, we'll just return the item_ids.
    # In a real project, you'd load 'u.item' to map item_id to title.
    return top_recommendations.index.tolist()
# --- Example: Get recommendations for user 42 ---
user_to_recommend = 42
recommended_movie_ids = recommend_movies(user_to_recommend)
print(f"\nTop 5 movie recommendations for User {user_to_recommend}:")
print(f"Recommended Movie IDs: {recommended_movie_ids}")
# To get actual movie titles, you would load the movie titles file
# e.g., movie_titles = pd.read_csv('u.item', sep='|', header=None, usecols=[0, 1], encoding='latin-1')
# and then map the IDs to titles.

Full Code (Condensed)

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from scipy.sparse.linalg import svds
from sklearn.datasets import load_mlcomp
# 1. Load Data
print("Loading data...")
data = load_mlcomp('ml-100k', 'ml-100k')
df = pd.DataFrame(data.data.toarray(), columns=data.feature_names)
# 2. Create Rating Matrix
print("Creating rating matrix...")
R_df = df.pivot(index='

python movielens svd

The Core Idea of SVD for Recommendations

Step-by-Step Python Implementation

Step 1: Setup and Installation

Step 2: Load and Prepare the MovieLens Data

Step 3: Create the User-Movie Rating Matrix

Step 4: Perform Truncated SVD

Step 5: Generate Predictions and Evaluate

Step 6: Make Recommendations for a Specific User

Full Code (Condensed)

99ANYc3cd6

Office做PPT教程，新手如何快速上手？

java set 转 list

Python 3.4 format方法怎么用？

byte array python

Python如何写入JSON文件？

Java Socket编程实例具体怎么实现？

Java AES 256加密如何实现？

Oracle C 教程，如何快速入门？

Python类型检查如何高效实现？

Eclipse怎么下载安装？新手必看教程！

java super和this

Django教材如何快速入门Python开发？

Python如何高效读取Excel文件？

CAD 2025新手如何快速入门？

Java String传值是值传递还是引用传递？

Python pymysql编码问题如何解决？

python movielens svd

The Core Idea of SVD for Recommendations

Step-by-Step Python Implementation

Step 1: Setup and Installation

Step 2: Load and Prepare the MovieLens Data

Step 3: Create the User-Movie Rating Matrix

Step 4: Perform Truncated SVD

Step 5: Generate Predictions and Evaluate

Step 6: Make Recommendations for a Specific User

Full Code (Condensed)

相关推荐

Java Socket编程实例具体怎么实现？