Nutrola's Open Food Nutrition Dataset: 500K+ Foods Available for Download

March 12, 2026

Download Nutrola's open food nutrition dataset with 500K+ verified entries including calories, macros, micronutrients, and serving sizes. Available in CSV and JSON for research, development, and education.

Medically reviewed by Dr. Emily Torres, Registered Dietitian Nutritionist (RDN)

Good nutrition data is hard to find. Researchers waste weeks cleaning government databases. Developers write brittle scrapers that break every month. Students writing thesis papers settle for small, outdated samples because assembling a comprehensive dataset from scratch is not realistic on an academic timeline.

We built Nutrola's food database to power our calorie tracking app, and over the past three years we have invested heavily in making that data accurate, comprehensive, and well-structured. Today we are releasing a curated subset of that database as an open dataset: over 500,000 verified food entries available for free download in CSV and JSON formats.

This post covers everything you need to know about the dataset — what is in it, how to download it, the schema, licensing, quality methodology, and how it compares to other publicly available nutrition data sources.

What Is in the Dataset

The Nutrola Open Food Nutrition Dataset contains 500,000+ food entries spanning raw ingredients, generic foods, branded consumer products, and common restaurant items. Every entry has been verified through our multi-layer quality control pipeline, the same system described in detail in our post on how we built our food database.

Each food entry includes the following data points:

Food name — the common name of the food item in English, with brand names where applicable
Calories — energy content in kilocalories (kcal) per 100 grams and per serving
Macronutrients — protein, total fat, saturated fat, trans fat, total carbohydrates, dietary fiber, total sugars, and added sugars, all in grams
Micronutrients — 30+ vitamins and minerals including vitamin A, vitamin C, vitamin D, vitamin E, vitamin K, thiamin, riboflavin, niacin, vitamin B6, folate, vitamin B12, calcium, iron, magnesium, phosphorus, potassium, sodium, zinc, copper, manganese, selenium, and more
Serving sizes — standard serving size description (e.g., "1 medium apple," "1 cup cooked"), serving weight in grams, and up to three alternative serving sizes per food
Food category — hierarchical classification using our internal taxonomy (e.g., Dairy > Cheese > Hard Cheese)
Country of origin — the primary country or region where the food product is sold or the ingredient is commonly consumed
Barcode (where available) — UPC or EAN codes for branded products
Data source tags — provenance indicators showing whether the entry originated from government databases, manufacturer data, laboratory analysis, or our internal verification team

Sample Data

Here is a selection of entries from the dataset to give you a sense of the structure and detail:

food_id	food_name	category	country	calories_per_100g	protein_g	fat_g	carbs_g	fiber_g	serving_desc	serving_g
NF-001247	Chicken Breast, Raw, Skinless	Poultry > Chicken	US	120	22.5	2.6	0.0	0.0	1 breast (174g)	174
NF-008391	Fage Total 0% Greek Yogurt	Dairy > Yogurt > Greek	GR	54	10.3	0.0	3.0	0.0	1 container (150g)	150
NF-014205	Basmati Rice, White, Cooked	Grains > Rice	IN	130	2.7	0.3	28.2	0.4	1 cup (158g)	158
NF-022876	Avocado, Hass, Raw	Fruits > Tropical	MX	160	2.0	14.7	8.5	6.7	1/2 avocado (68g)	68
NF-031560	Barilla Penne Rigate, Dry	Pasta > Dried	IT	359	12.5	2.0	71.2	3.0	2 oz (56g)	56
NF-045892	Kimchi, Traditional Napa Cabbage	Vegetables > Fermented	KR	15	1.1	0.5	2.4	1.6	1/2 cup (75g)	75
NF-053714	Salmon, Atlantic, Raw, Farmed	Fish > Salmon	NO	208	20.4	13.4	0.0	0.0	1 fillet (113g)	113
NF-067283	Chickpeas, Canned, Drained	Legumes > Beans	US	119	6.3	2.0	18.2	5.4	1/2 cup (120g)	120

The full dataset includes many more columns for micronutrients, alternative serving sizes, barcode data, and source tags. The table above shows the core nutritional fields.

Data Formats

The dataset is available in two formats:

CSV

The CSV file uses UTF-8 encoding with comma delimiters. The first row contains column headers. Fields that contain commas are enclosed in double quotes. Null values are represented as empty fields.

The CSV format is ideal for spreadsheet tools like Excel and Google Sheets, statistical software like R and SPSS, and quick data exploration with command-line tools like csvkit or xsv.

File: nutrola-open-food-dataset-v3.csv (approximately 210 MB uncompressed, 48 MB gzipped)

JSON

The JSON file contains an array of objects, one per food entry. Nested objects are used for structured fields like serving sizes (which contain a description, gram weight, and milliliter equivalent where applicable) and micronutrient profiles.

The JSON format is better suited for application development, database imports, and any workflow where you need to preserve the hierarchical structure of serving sizes and nutrient groups.

File: nutrola-open-food-dataset-v3.json (approximately 340 MB uncompressed, 62 MB gzipped)

Both files are also available as gzip-compressed archives to reduce download times.

Data Schema

Here is the full schema with descriptions for every field in the dataset:

Field Name	Type	Description
`food_id`	string	Unique Nutrola identifier for the food entry (format: NF-XXXXXX)
`food_name`	string	Common name of the food, including brand where applicable
`category_l1`	string	Top-level food category (e.g., Dairy, Grains, Fruits)
`category_l2`	string	Second-level category (e.g., Cheese, Rice, Tropical)
`category_l3`	string	Third-level category where applicable (e.g., Hard Cheese, Brown Rice)
`country`	string	ISO 3166-1 alpha-2 country code indicating primary market
`brand`	string	Brand name for branded products; null for generic foods
`barcode`	string	UPC/EAN barcode; null if not applicable
`calories_per_100g`	float	Energy in kcal per 100 grams
`protein_g`	float	Protein in grams per 100g
`fat_total_g`	float	Total fat in grams per 100g
`fat_saturated_g`	float	Saturated fat in grams per 100g
`fat_trans_g`	float	Trans fat in grams per 100g
`carbs_total_g`	float	Total carbohydrates in grams per 100g
`fiber_g`	float	Dietary fiber in grams per 100g
`sugars_total_g`	float	Total sugars in grams per 100g
`sugars_added_g`	float	Added sugars in grams per 100g
`sodium_mg`	float	Sodium in milligrams per 100g
`cholesterol_mg`	float	Cholesterol in milligrams per 100g
`vitamin_a_mcg`	float	Vitamin A in micrograms RAE per 100g
`vitamin_c_mg`	float	Vitamin C in milligrams per 100g
`vitamin_d_mcg`	float	Vitamin D in micrograms per 100g
`calcium_mg`	float	Calcium in milligrams per 100g
`iron_mg`	float	Iron in milligrams per 100g
`potassium_mg`	float	Potassium in milligrams per 100g
`magnesium_mg`	float	Magnesium in milligrams per 100g
`zinc_mg`	float	Zinc in milligrams per 100g
`phosphorus_mg`	float	Phosphorus in milligrams per 100g
`selenium_mcg`	float	Selenium in micrograms per 100g
`vitamin_b6_mg`	float	Vitamin B6 in milligrams per 100g
`vitamin_b12_mcg`	float	Vitamin B12 in micrograms per 100g
`folate_mcg`	float	Folate in micrograms DFE per 100g
`vitamin_e_mg`	float	Vitamin E in milligrams per 100g
`vitamin_k_mcg`	float	Vitamin K in micrograms per 100g
`thiamin_mg`	float	Thiamin (B1) in milligrams per 100g
`riboflavin_mg`	float	Riboflavin (B2) in milligrams per 100g
`niacin_mg`	float	Niacin (B3) in milligrams per 100g
`copper_mg`	float	Copper in milligrams per 100g
`manganese_mg`	float	Manganese in milligrams per 100g
`serving_1_desc`	string	Primary serving size description (e.g., "1 cup cooked")
`serving_1_g`	float	Primary serving size weight in grams
`serving_2_desc`	string	Alternative serving size description; null if not available
`serving_2_g`	float	Alternative serving size weight in grams
`serving_3_desc`	string	Second alternative serving size description; null if not available
`serving_3_g`	float	Second alternative serving size weight in grams
`data_source`	string	Provenance tag: "government", "manufacturer", "laboratory", or "verified_community"
`last_verified`	string	ISO 8601 date when the entry was last verified (YYYY-MM-DD)
`dataset_version`	string	Dataset version identifier (e.g., "v3.0")

All nutrient values are expressed per 100 grams to allow consistent comparisons. To calculate nutrients per serving, multiply the per-100g value by the serving weight in grams and divide by 100.

How to Download

The dataset is hosted on our public GitHub repository:

github.com/nutrola/open-food-nutrition-dataset

You can download the files directly from the GitHub Releases page, or clone the repository:

git clone https://github.com/nutrola/open-food-nutrition-dataset.git

For the compressed versions:

# Download CSV (gzipped)
wget https://github.com/nutrola/open-food-nutrition-dataset/releases/latest/download/nutrola-open-food-dataset-v3.csv.gz

# Download JSON (gzipped)
wget https://github.com/nutrola/open-food-nutrition-dataset/releases/latest/download/nutrola-open-food-dataset-v3.json.gz

The repository also contains:

A detailed README.md with quickstart instructions
A CHANGELOG.md documenting changes between dataset versions
A scripts/ directory with Python and R example scripts for loading, filtering, and analyzing the data
A schema/ directory with JSON Schema and CSV dialect definitions

If you need the full 3 million+ entry database with real-time updates rather than periodic snapshots, see our Nutrition Data API for developer access.

Use Cases

Academic Research

Nutrition researchers can use the dataset for dietary pattern analysis, epidemiological modeling, and nutrient density studies without spending weeks cleaning and merging government data files. The hierarchical category system makes it straightforward to filter by food groups, and the country field enables cross-cultural comparisons.

Published research using the dataset should cite it as: Nutrola Open Food Nutrition Dataset, v3.0 (2026). Available at github.com/nutrola/open-food-nutrition-dataset. Licensed under CC BY-SA 4.0.

Application Development

Developers building health, fitness, or food-related applications can use the dataset as a local food database. The consistent schema and serving size data mean you can build a functional food logging feature without relying on a live API connection. This is particularly useful for offline-first mobile apps, prototyping, and hackathon projects.

The CSV format loads directly into SQLite, PostgreSQL, or any relational database. The JSON format maps cleanly to document stores like MongoDB or Firestore.

Data Science and Machine Learning

The dataset is well-suited for training and evaluating machine learning models related to food and nutrition. Common applications include:

Food classification models — use the category hierarchy as training labels to build classifiers that predict food categories from names or nutrition profiles
Nutrition estimation — train regression models that predict calorie or macro content from partial information (e.g., estimating calories from protein, fat, and carb ratios)
Recommendation systems — build food recommendation engines that suggest nutritionally similar alternatives
Anomaly detection — identify unusual nutrition profiles that might indicate data quality issues in other datasets

Education

Nutrition science students and educators can use the dataset for coursework, labs, and assignments. The breadth of the data — covering foods from dozens of countries and spanning every major food group — makes it useful for teaching concepts like macronutrient ratios, micronutrient density, and how nutrition profiles vary across cuisines and food processing levels.

Public Health and Policy

Public health organizations can use the data to analyze the nutritional landscape of specific food categories or markets. The country field allows filtering by region, and the brand field enables analysis of branded vs. generic food nutrition quality.

Data Quality Methodology

Releasing an open dataset means nothing if the data is not trustworthy. Here is how we ensure quality across the 500,000+ entries in this release.

Multi-Source Verification

Every entry in the dataset has been verified against at least two independent sources. Our primary data sources include:

Government nutrition databases — USDA FoodData Central (United States), CoFID (United Kingdom), NUTTAB (Australia), CNF (Canada), and equivalent databases from 20+ countries
Manufacturer-provided data — nutrition facts panels submitted directly by food manufacturers through our brand partnership program
Laboratory analysis — independent lab testing conducted by our team for high-volume foods where source data is conflicting or outdated
Verified community submissions — user-submitted entries that have passed our three-step verification process (automated cross-referencing, expert review, and statistical outlier detection)

Automated Quality Checks

Every entry passes through a battery of automated checks before it enters the dataset:

Energy balance validation — the calorie count is cross-checked against the Atwater calculation (4 kcal/g protein + 9 kcal/g fat + 4 kcal/g carbohydrate). Entries where the stated calories deviate from the calculated value by more than 10% are flagged for manual review.
Range checks — every nutrient value is validated against physiologically plausible ranges for the food category. A cheese entry claiming 0 grams of fat or a fruit entry claiming 50 grams of protein gets flagged immediately.
Cross-entry consistency — similar foods are compared statistically. If a new chicken breast entry has significantly different values from the existing cluster of chicken breast entries, it is held for review.
Serving size validation — serving weights are checked against known standard portions. A "1 medium apple" claiming to weigh 500 grams does not pass.

Human Review

Entries flagged by automated checks go through manual review by our data team, which includes credentialed nutritionists and food scientists. Approximately 12% of entries require some form of manual correction before they are approved.

Ongoing Maintenance

The dataset is not a one-time dump. We re-verify entries on a rolling basis, prioritizing high-volume foods (those most frequently logged by Nutrola users) and entries whose source data has been updated. When a food manufacturer reformulates a product, we catch the change through our barcode monitoring system and update the entry accordingly.

Update Frequency

We publish new versions of the open dataset quarterly. Each release includes:

New food entries added since the previous version
Corrections to existing entries identified through our quality monitoring
Updated nutrition data for reformulated products
Expanded micronutrient coverage where new source data becomes available

The current version is v3.0, released in March 2026. Version history and changelogs are available in the GitHub repository.

If you need data that is updated more frequently than quarterly, our Nutrition Data API reflects changes within 48 hours.

License

The Nutrola Open Food Nutrition Dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

This means you are free to:

Share — copy and redistribute the dataset in any medium or format
Adapt — remix, transform, and build upon the dataset for any purpose, including commercial use

Under the following terms:

Attribution — you must give appropriate credit to Nutrola, provide a link to the license, and indicate if changes were made
ShareAlike — if you remix, transform, or build upon the dataset, you must distribute your contributions under the same CC BY-SA 4.0 license

We chose CC BY-SA 4.0 because it strikes the right balance between openness and ensuring that improvements flow back to the community. If you build a better version of this data, the license ensures that your improvements remain available to everyone else too.

How It Compares to Other Datasets

There are several publicly available nutrition datasets. Here is how the Nutrola Open Food Nutrition Dataset compares to the two most widely used alternatives.

vs. USDA FoodData Central

USDA FoodData Central is the gold standard for nutrition data in the United States. It is thorough, well-documented, and backed by laboratory analysis. However, it has limitations that the Nutrola dataset addresses:

Dimension	USDA FoodData Central	Nutrola Open Dataset
Total entries	~400,000 (Foundation, SR Legacy, Branded combined)	500,000+
Geographic coverage	Primarily United States	47 countries
Branded products	US brands only, often outdated	International brands, verified quarterly
Data format	Multiple incompatible file formats, complex relational structure	Single CSV or JSON file, flat structure
Serving sizes	Inconsistent across sub-databases	Standardized format with up to 3 servings per food
Ease of use	Requires significant data engineering to merge sub-databases	Download one file and start working
Update frequency	Varies by sub-database (annually for some)	Quarterly

If your work is focused exclusively on US foods and you need the deepest possible nutrient profile (USDA covers 150+ nutrients for Foundation foods), FoodData Central is the better choice. If you need international coverage, consistent formatting, and a dataset that works out of the box, the Nutrola dataset is the stronger option.

The two datasets are complementary. Many researchers use USDA Foundation data for detailed US nutrient analysis and supplement it with Nutrola data for international coverage and branded products.

vs. Open Food Facts

Open Food Facts is a crowdsourced database with over 3 million entries. It has impressive scale and covers products from many countries. However, its crowdsourced nature introduces data quality challenges:

Dimension	Open Food Facts	Nutrola Open Dataset
Total entries	3M+	500,000+
Data quality	Variable — crowdsourced with automated checks	Verified — multi-source, human-reviewed
Completeness	Many entries missing macro/micro data	All entries have complete macro data; 90%+ have full micro profiles
Serving sizes	Inconsistent, often missing	Standardized, always present
Category taxonomy	Crowdsourced tags, inconsistent	Hierarchical, curated taxonomy
Nutrient coverage	Varies widely per entry	Consistent 40+ nutrients across all entries
Data format	MongoDB dump, complex nested JSON	Clean CSV and JSON
License	Open Database License (ODbL)	CC BY-SA 4.0

Open Food Facts excels at breadth — if you need to look up a specific obscure product by barcode, they likely have it. The Nutrola dataset excels at depth and consistency — every entry meets the same quality bar, making it more reliable for quantitative analysis where data gaps or errors can skew results.

If you are building a barcode scanner app and need maximum product coverage, Open Food Facts is a good starting point. If you are training a machine learning model, conducting statistical research, or building an app where nutrition accuracy matters, the Nutrola dataset's verified data will give you a stronger foundation.

Getting Started

Once you have downloaded the dataset, here is a quick example of loading and exploring it in Python:

import pandas as pd

# Load the dataset
df = pd.read_csv("nutrola-open-food-dataset-v3.csv")

# Basic overview
print(f"Total entries: {len(df):,}")
print(f"Countries covered: {df['country'].nunique()}")
print(f"Food categories (L1): {df['category_l1'].nunique()}")

# Find high-protein, low-calorie foods
high_protein = df[
    (df["protein_g"] > 20) &
    (df["calories_per_100g"] < 150)
].sort_values("protein_g", ascending=False)

print(high_protein[["food_name", "calories_per_100g", "protein_g"]].head(10))

# Analyze average macros by food category
category_macros = df.groupby("category_l1").agg({
    "calories_per_100g": "mean",
    "protein_g": "mean",
    "fat_total_g": "mean",
    "carbs_total_g": "mean"
}).round(1)

print(category_macros.sort_values("calories_per_100g", ascending=False))

More examples — including R scripts, SQL import guides, and Jupyter notebooks — are available in the scripts/ directory of the GitHub repository.

Frequently Asked Questions

Is the dataset really free to use?

Yes. The Nutrola Open Food Nutrition Dataset is released under the CC BY-SA 4.0 license, which permits commercial and non-commercial use. The only requirements are that you credit Nutrola as the source and that any derivative datasets you distribute use the same license. There are no API keys, no usage limits, and no registration required to download the files.

How often is the dataset updated?

We publish new versions quarterly. Each release adds new food entries, corrects any errors identified since the previous version, and updates entries for products that have been reformulated. The GitHub repository's Releases page has the full version history, and you can watch the repository to be notified when new versions are published.

Can I use this dataset to build a commercial app?

Yes. The CC BY-SA 4.0 license explicitly allows commercial use. You can use the data in a paid app, a SaaS product, or any other commercial context. You must include attribution to Nutrola in your app or documentation, and if you distribute a modified version of the dataset itself, the modified version must also be licensed under CC BY-SA 4.0. Using the data within your app (without redistributing the raw dataset) does not trigger the ShareAlike requirement.

Why only 500K entries when Nutrola's full database has 3 million+?

The open dataset contains entries that we can release under an open license without restrictions. Our full database includes data from proprietary sources — direct manufacturer partnerships, licensed laboratory data, and other sources with contractual limitations on redistribution. The 500K entries in the open dataset come from government databases, our own laboratory analysis, and community submissions where contributors agreed to open licensing. If you need access to the full database, our Nutrition Data API provides it under separate commercial terms.

What should I do if I find an error in the dataset?

Open an issue on the GitHub repository with the food_id of the affected entry and a description of the error. Include a source link if you have one (e.g., a manufacturer's website showing different nutrition facts). Our data team reviews reported issues weekly, and confirmed corrections are included in the next quarterly release. For urgent corrections, we may push a patch release between quarterly updates.

How does this relate to the Nutrola Nutrition Data API?

The open dataset is a static quarterly snapshot of a curated subset of our database. The API provides real-time access to the full 3 million+ entry database with search, filtering, barcode lookup, and other features. Think of the open dataset as the foundation for offline or batch use cases, and the API as the solution for production applications that need live data. Many developers start with the open dataset for prototyping and migrate to the API when they go to production.

Ready to Transform Your Nutrition Tracking?

Join thousands who have transformed their health journey with Nutrola!