Nutrola's Open Food Nutrition Dataset: 500K+ Foods Available for Download

Download Nutrola's open food nutrition dataset with 500K+ verified entries including calories, macros, micronutrients, and serving sizes. Available in CSV and JSON for research, development, and education.

Good nutrition data is hard to find. Researchers waste weeks cleaning government databases. Developers write brittle scrapers that break every month. Students writing thesis papers settle for small, outdated samples because assembling a comprehensive dataset from scratch is not realistic on an academic timeline.

We built Nutrola's food database to power our calorie tracking app, and over the past three years we have invested heavily in making that data accurate, comprehensive, and well-structured. Today we are releasing a curated subset of that database as an open dataset: over 500,000 verified food entries available for free download in CSV and JSON formats.

This post covers everything you need to know about the dataset — what is in it, how to download it, the schema, licensing, quality methodology, and how it compares to other publicly available nutrition data sources.

What Is in the Dataset

The Nutrola Open Food Nutrition Dataset contains 500,000+ food entries spanning raw ingredients, generic foods, branded consumer products, and common restaurant items. Every entry has been verified through our multi-layer quality control pipeline, the same system described in detail in our post on how we built our food database.

Each food entry includes the following data points:

  • Food name — the common name of the food item in English, with brand names where applicable
  • Calories — energy content in kilocalories (kcal) per 100 grams and per serving
  • Macronutrients — protein, total fat, saturated fat, trans fat, total carbohydrates, dietary fiber, total sugars, and added sugars, all in grams
  • Micronutrients — 30+ vitamins and minerals including vitamin A, vitamin C, vitamin D, vitamin E, vitamin K, thiamin, riboflavin, niacin, vitamin B6, folate, vitamin B12, calcium, iron, magnesium, phosphorus, potassium, sodium, zinc, copper, manganese, selenium, and more
  • Serving sizes — standard serving size description (e.g., "1 medium apple," "1 cup cooked"), serving weight in grams, and up to three alternative serving sizes per food
  • Food category — hierarchical classification using our internal taxonomy (e.g., Dairy > Cheese > Hard Cheese)
  • Country of origin — the primary country or region where the food product is sold or the ingredient is commonly consumed
  • Barcode (where available) — UPC or EAN codes for branded products
  • Data source tags — provenance indicators showing whether the entry originated from government databases, manufacturer data, laboratory analysis, or our internal verification team

Sample Data

Here is a selection of entries from the dataset to give you a sense of the structure and detail:

food_id food_name category country calories_per_100g protein_g fat_g carbs_g fiber_g serving_desc serving_g
NF-001247 Chicken Breast, Raw, Skinless Poultry > Chicken US 120 22.5 2.6 0.0 0.0 1 breast (174g) 174
NF-008391 Fage Total 0% Greek Yogurt Dairy > Yogurt > Greek GR 54 10.3 0.0 3.0 0.0 1 container (150g) 150
NF-014205 Basmati Rice, White, Cooked Grains > Rice IN 130 2.7 0.3 28.2 0.4 1 cup (158g) 158
NF-022876 Avocado, Hass, Raw Fruits > Tropical MX 160 2.0 14.7 8.5 6.7 1/2 avocado (68g) 68
NF-031560 Barilla Penne Rigate, Dry Pasta > Dried IT 359 12.5 2.0 71.2 3.0 2 oz (56g) 56
NF-045892 Kimchi, Traditional Napa Cabbage Vegetables > Fermented KR 15 1.1 0.5 2.4 1.6 1/2 cup (75g) 75
NF-053714 Salmon, Atlantic, Raw, Farmed Fish > Salmon NO 208 20.4 13.4 0.0 0.0 1 fillet (113g) 113
NF-067283 Chickpeas, Canned, Drained Legumes > Beans US 119 6.3 2.0 18.2 5.4 1/2 cup (120g) 120

The full dataset includes many more columns for micronutrients, alternative serving sizes, barcode data, and source tags. The table above shows the core nutritional fields.

Data Formats

The dataset is available in two formats:

CSV

The CSV file uses UTF-8 encoding with comma delimiters. The first row contains column headers. Fields that contain commas are enclosed in double quotes. Null values are represented as empty fields.

The CSV format is ideal for spreadsheet tools like Excel and Google Sheets, statistical software like R and SPSS, and quick data exploration with command-line tools like csvkit or xsv.

File: nutrola-open-food-dataset-v3.csv (approximately 210 MB uncompressed, 48 MB gzipped)

JSON

The JSON file contains an array of objects, one per food entry. Nested objects are used for structured fields like serving sizes (which contain a description, gram weight, and milliliter equivalent where applicable) and micronutrient profiles.

The JSON format is better suited for application development, database imports, and any workflow where you need to preserve the hierarchical structure of serving sizes and nutrient groups.

File: nutrola-open-food-dataset-v3.json (approximately 340 MB uncompressed, 62 MB gzipped)

Both files are also available as gzip-compressed archives to reduce download times.

Data Schema

Here is the full schema with descriptions for every field in the dataset:

Field Name Type Description
food_id string Unique Nutrola identifier for the food entry (format: NF-XXXXXX)
food_name string Common name of the food, including brand where applicable
category_l1 string Top-level food category (e.g., Dairy, Grains, Fruits)
category_l2 string Second-level category (e.g., Cheese, Rice, Tropical)
category_l3 string Third-level category where applicable (e.g., Hard Cheese, Brown Rice)
country string ISO 3166-1 alpha-2 country code indicating primary market
brand string Brand name for branded products; null for generic foods
barcode string UPC/EAN barcode; null if not applicable
calories_per_100g float Energy in kcal per 100 grams
protein_g float Protein in grams per 100g
fat_total_g float Total fat in grams per 100g
fat_saturated_g float Saturated fat in grams per 100g
fat_trans_g float Trans fat in grams per 100g
carbs_total_g float Total carbohydrates in grams per 100g
fiber_g float Dietary fiber in grams per 100g
sugars_total_g float Total sugars in grams per 100g
sugars_added_g float Added sugars in grams per 100g
sodium_mg float Sodium in milligrams per 100g
cholesterol_mg float Cholesterol in milligrams per 100g
vitamin_a_mcg float Vitamin A in micrograms RAE per 100g
vitamin_c_mg float Vitamin C in milligrams per 100g
vitamin_d_mcg float Vitamin D in micrograms per 100g
calcium_mg float Calcium in milligrams per 100g
iron_mg float Iron in milligrams per 100g
potassium_mg float Potassium in milligrams per 100g
magnesium_mg float Magnesium in milligrams per 100g
zinc_mg float Zinc in milligrams per 100g
phosphorus_mg float Phosphorus in milligrams per 100g
selenium_mcg float Selenium in micrograms per 100g
vitamin_b6_mg float Vitamin B6 in milligrams per 100g
vitamin_b12_mcg float Vitamin B12 in micrograms per 100g
folate_mcg float Folate in micrograms DFE per 100g
vitamin_e_mg float Vitamin E in milligrams per 100g
vitamin_k_mcg float Vitamin K in micrograms per 100g
thiamin_mg float Thiamin (B1) in milligrams per 100g
riboflavin_mg float Riboflavin (B2) in milligrams per 100g
niacin_mg float Niacin (B3) in milligrams per 100g
copper_mg float Copper in milligrams per 100g
manganese_mg float Manganese in milligrams per 100g
serving_1_desc string Primary serving size description (e.g., "1 cup cooked")
serving_1_g float Primary serving size weight in grams
serving_2_desc string Alternative serving size description; null if not available
serving_2_g float Alternative serving size weight in grams
serving_3_desc string Second alternative serving size description; null if not available
serving_3_g float Second alternative serving size weight in grams
data_source string Provenance tag: "government", "manufacturer", "laboratory", or "verified_community"
last_verified string ISO 8601 date when the entry was last verified (YYYY-MM-DD)
dataset_version string Dataset version identifier (e.g., "v3.0")

All nutrient values are expressed per 100 grams to allow consistent comparisons. To calculate nutrients per serving, multiply the per-100g value by the serving weight in grams and divide by 100.

How to Download

The dataset is hosted on our public GitHub repository:

github.com/nutrola/open-food-nutrition-dataset

You can download the files directly from the GitHub Releases page, or clone the repository:

git clone https://github.com/nutrola/open-food-nutrition-dataset.git

For the compressed versions:

# Download CSV (gzipped)
wget https://github.com/nutrola/open-food-nutrition-dataset/releases/latest/download/nutrola-open-food-dataset-v3.csv.gz

# Download JSON (gzipped)
wget https://github.com/nutrola/open-food-nutrition-dataset/releases/latest/download/nutrola-open-food-dataset-v3.json.gz

The repository also contains:

  • A detailed README.md with quickstart instructions
  • A CHANGELOG.md documenting changes between dataset versions
  • A scripts/ directory with Python and R example scripts for loading, filtering, and analyzing the data
  • A schema/ directory with JSON Schema and CSV dialect definitions

If you need the full 3 million+ entry database with real-time updates rather than periodic snapshots, see our Nutrition Data API for developer access.

Use Cases

Academic Research

Nutrition researchers can use the dataset for dietary pattern analysis, epidemiological modeling, and nutrient density studies without spending weeks cleaning and merging government data files. The hierarchical category system makes it straightforward to filter by food groups, and the country field enables cross-cultural comparisons.

Published research using the dataset should cite it as: Nutrola Open Food Nutrition Dataset, v3.0 (2026). Available at github.com/nutrola/open-food-nutrition-dataset. Licensed under CC BY-SA 4.0.

Application Development

Developers building health, fitness, or food-related applications can use the dataset as a local food database. The consistent schema and serving size data mean you can build a functional food logging feature without relying on a live API connection. This is particularly useful for offline-first mobile apps, prototyping, and hackathon projects.

The CSV format loads directly into SQLite, PostgreSQL, or any relational database. The JSON format maps cleanly to document stores like MongoDB or Firestore.

Data Science and Machine Learning

The dataset is well-suited for training and evaluating machine learning models related to food and nutrition. Common applications include:

  • Food classification models — use the category hierarchy as training labels to build classifiers that predict food categories from names or nutrition profiles
  • Nutrition estimation — train regression models that predict calorie or macro content from partial information (e.g., estimating calories from protein, fat, and carb ratios)
  • Recommendation systems — build food recommendation engines that suggest nutritionally similar alternatives
  • Anomaly detection — identify unusual nutrition profiles that might indicate data quality issues in other datasets

Education

Nutrition science students and educators can use the dataset for coursework, labs, and assignments. The breadth of the data — covering foods from dozens of countries and spanning every major food group — makes it useful for teaching concepts like macronutrient ratios, micronutrient density, and how nutrition profiles vary across cuisines and food processing levels.

Public Health and Policy

Public health organizations can use the data to analyze the nutritional landscape of specific food categories or markets. The country field allows filtering by region, and the brand field enables analysis of branded vs. generic food nutrition quality.

Data Quality Methodology

Releasing an open dataset means nothing if the data is not trustworthy. Here is how we ensure quality across the 500,000+ entries in this release.

Multi-Source Verification

Every entry in the dataset has been verified against at least two independent sources. Our primary data sources include:

  • Government nutrition databases — USDA FoodData Central (United States), CoFID (United Kingdom), NUTTAB (Australia), CNF (Canada), and equivalent databases from 20+ countries
  • Manufacturer-provided data — nutrition facts panels submitted directly by food manufacturers through our brand partnership program
  • Laboratory analysis — independent lab testing conducted by our team for high-volume foods where source data is conflicting or outdated
  • Verified community submissions — user-submitted entries that have passed our three-step verification process (automated cross-referencing, expert review, and statistical outlier detection)

Automated Quality Checks

Every entry passes through a battery of automated checks before it enters the dataset:

  • Energy balance validation — the calorie count is cross-checked against the Atwater calculation (4 kcal/g protein + 9 kcal/g fat + 4 kcal/g carbohydrate). Entries where the stated calories deviate from the calculated value by more than 10% are flagged for manual review.
  • Range checks — every nutrient value is validated against physiologically plausible ranges for the food category. A cheese entry claiming 0 grams of fat or a fruit entry claiming 50 grams of protein gets flagged immediately.
  • Cross-entry consistency — similar foods are compared statistically. If a new chicken breast entry has significantly different values from the existing cluster of chicken breast entries, it is held for review.
  • Serving size validation — serving weights are checked against known standard portions. A "1 medium apple" claiming to weigh 500 grams does not pass.

Human Review

Entries flagged by automated checks go through manual review by our data team, which includes credentialed nutritionists and food scientists. Approximately 12% of entries require some form of manual correction before they are approved.

Ongoing Maintenance

The dataset is not a one-time dump. We re-verify entries on a rolling basis, prioritizing high-volume foods (those most frequently logged by Nutrola users) and entries whose source data has been updated. When a food manufacturer reformulates a product, we catch the change through our barcode monitoring system and update the entry accordingly.

Update Frequency

We publish new versions of the open dataset quarterly. Each release includes:

  • New food entries added since the previous version
  • Corrections to existing entries identified through our quality monitoring
  • Updated nutrition data for reformulated products
  • Expanded micronutrient coverage where new source data becomes available

The current version is v3.0, released in March 2026. Version history and changelogs are available in the GitHub repository.

If you need data that is updated more frequently than quarterly, our Nutrition Data API reflects changes within 48 hours.

License

The Nutrola Open Food Nutrition Dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

This means you are free to:

  • Share — copy and redistribute the dataset in any medium or format
  • Adapt — remix, transform, and build upon the dataset for any purpose, including commercial use

Under the following terms:

  • Attribution — you must give appropriate credit to Nutrola, provide a link to the license, and indicate if changes were made
  • ShareAlike — if you remix, transform, or build upon the dataset, you must distribute your contributions under the same CC BY-SA 4.0 license

We chose CC BY-SA 4.0 because it strikes the right balance between openness and ensuring that improvements flow back to the community. If you build a better version of this data, the license ensures that your improvements remain available to everyone else too.

How It Compares to Other Datasets

There are several publicly available nutrition datasets. Here is how the Nutrola Open Food Nutrition Dataset compares to the two most widely used alternatives.

vs. USDA FoodData Central

USDA FoodData Central is the gold standard for nutrition data in the United States. It is thorough, well-documented, and backed by laboratory analysis. However, it has limitations that the Nutrola dataset addresses:

Dimension USDA FoodData Central Nutrola Open Dataset
Total entries ~400,000 (Foundation, SR Legacy, Branded combined) 500,000+
Geographic coverage Primarily United States 47 countries
Branded products US brands only, often outdated International brands, verified quarterly
Data format Multiple incompatible file formats, complex relational structure Single CSV or JSON file, flat structure
Serving sizes Inconsistent across sub-databases Standardized format with up to 3 servings per food
Ease of use Requires significant data engineering to merge sub-databases Download one file and start working
Update frequency Varies by sub-database (annually for some) Quarterly

If your work is focused exclusively on US foods and you need the deepest possible nutrient profile (USDA covers 150+ nutrients for Foundation foods), FoodData Central is the better choice. If you need international coverage, consistent formatting, and a dataset that works out of the box, the Nutrola dataset is the stronger option.

The two datasets are complementary. Many researchers use USDA Foundation data for detailed US nutrient analysis and supplement it with Nutrola data for international coverage and branded products.

vs. Open Food Facts

Open Food Facts is a crowdsourced database with over 3 million entries. It has impressive scale and covers products from many countries. However, its crowdsourced nature introduces data quality challenges:

Dimension Open Food Facts Nutrola Open Dataset
Total entries 3M+ 500,000+
Data quality Variable — crowdsourced with automated checks Verified — multi-source, human-reviewed
Completeness Many entries missing macro/micro data All entries have complete macro data; 90%+ have full micro profiles
Serving sizes Inconsistent, often missing Standardized, always present
Category taxonomy Crowdsourced tags, inconsistent Hierarchical, curated taxonomy
Nutrient coverage Varies widely per entry Consistent 40+ nutrients across all entries
Data format MongoDB dump, complex nested JSON Clean CSV and JSON
License Open Database License (ODbL) CC BY-SA 4.0

Open Food Facts excels at breadth — if you need to look up a specific obscure product by barcode, they likely have it. The Nutrola dataset excels at depth and consistency — every entry meets the same quality bar, making it more reliable for quantitative analysis where data gaps or errors can skew results.

If you are building a barcode scanner app and need maximum product coverage, Open Food Facts is a good starting point. If you are training a machine learning model, conducting statistical research, or building an app where nutrition accuracy matters, the Nutrola dataset's verified data will give you a stronger foundation.

Getting Started

Once you have downloaded the dataset, here is a quick example of loading and exploring it in Python:

import pandas as pd

# Load the dataset
df = pd.read_csv("nutrola-open-food-dataset-v3.csv")

# Basic overview
print(f"Total entries: {len(df):,}")
print(f"Countries covered: {df['country'].nunique()}")
print(f"Food categories (L1): {df['category_l1'].nunique()}")

# Find high-protein, low-calorie foods
high_protein = df[
    (df["protein_g"] > 20) &
    (df["calories_per_100g"] < 150)
].sort_values("protein_g", ascending=False)

print(high_protein[["food_name", "calories_per_100g", "protein_g"]].head(10))
# Analyze average macros by food category
category_macros = df.groupby("category_l1").agg({
    "calories_per_100g": "mean",
    "protein_g": "mean",
    "fat_total_g": "mean",
    "carbs_total_g": "mean"
}).round(1)

print(category_macros.sort_values("calories_per_100g", ascending=False))

More examples — including R scripts, SQL import guides, and Jupyter notebooks — are available in the scripts/ directory of the GitHub repository.

Frequently Asked Questions

Is the dataset really free to use?

Yes. The Nutrola Open Food Nutrition Dataset is released under the CC BY-SA 4.0 license, which permits commercial and non-commercial use. The only requirements are that you credit Nutrola as the source and that any derivative datasets you distribute use the same license. There are no API keys, no usage limits, and no registration required to download the files.

How often is the dataset updated?

We publish new versions quarterly. Each release adds new food entries, corrects any errors identified since the previous version, and updates entries for products that have been reformulated. The GitHub repository's Releases page has the full version history, and you can watch the repository to be notified when new versions are published.

Can I use this dataset to build a commercial app?

Yes. The CC BY-SA 4.0 license explicitly allows commercial use. You can use the data in a paid app, a SaaS product, or any other commercial context. You must include attribution to Nutrola in your app or documentation, and if you distribute a modified version of the dataset itself, the modified version must also be licensed under CC BY-SA 4.0. Using the data within your app (without redistributing the raw dataset) does not trigger the ShareAlike requirement.

Why only 500K entries when Nutrola's full database has 3 million+?

The open dataset contains entries that we can release under an open license without restrictions. Our full database includes data from proprietary sources — direct manufacturer partnerships, licensed laboratory data, and other sources with contractual limitations on redistribution. The 500K entries in the open dataset come from government databases, our own laboratory analysis, and community submissions where contributors agreed to open licensing. If you need access to the full database, our Nutrition Data API provides it under separate commercial terms.

What should I do if I find an error in the dataset?

Open an issue on the GitHub repository with the food_id of the affected entry and a description of the error. Include a source link if you have one (e.g., a manufacturer's website showing different nutrition facts). Our data team reviews reported issues weekly, and confirmed corrections are included in the next quarterly release. For urgent corrections, we may push a patch release between quarterly updates.

How does this relate to the Nutrola Nutrition Data API?

The open dataset is a static quarterly snapshot of a curated subset of our database. The API provides real-time access to the full 3 million+ entry database with search, filtering, barcode lookup, and other features. Think of the open dataset as the foundation for offline or batch use cases, and the API as the solution for production applications that need live data. Many developers start with the open dataset for prototyping and migrate to the API when they go to production.

Ready to Transform Your Nutrition Tracking?

Join thousands who have transformed their health journey with Nutrola!

Nutrola Open Food Nutrition Dataset: 500K+ Foods Free Download | Nutrola