Dask DataFrame unhashable type: 'numpy.ndarray'

Hi everyone. I just happened to read a very huge file using dask.dataframe.read_parquet() . The problem I am having now is related to the dataset. Upon executing the codes below, I get an error unhashable type: 'numpy.ndarray' . Snippet:

I understand that my data contains a numpy array. The question then is, how do I load it? Should I get away of calling compute() ? Here is a link to the file that I am using: QCDToGGQQ_IMGjet_RH1all_jet0_run1_n47540.test.snappy.parquet - Google Drive

Any form of help will be highly appreciated. Thanks!

Hi @rsohaljr_14 ,

I think you should have waited a bit before opening a new thread, and preferably completed the other one.

This code will try to count the unique values of each of the Series in your DataFrame, which is not possible on a column with Numpy arrays. Hence this error.

I am not sure if I understood your response. You were saying that I cannot use .value_counts() ? Should I use .to_delayed() instead?

value_counts and to_delayed have completely different goals and results. value_counts is just not compatible with 'numpy.ndarray' columns.

Again, the question is what are you trying to achieve with this data?

Related Topics

Topic Replies Views Activity
Dask DataFrame 10 1795 June 21, 2023
Dask DataFrame 2 532 August 29, 2023
Dask DataFrame 3 10 August 30, 2024
Announcements 0 424 May 2, 2022
Dask DataFrame 2 876 September 4, 2023

Dask “Column assignment doesn’t support type numpy.ndarray”

dask dataframe column assignment doesn't support type numpy.ndarray

I’m trying to use Dask instead of pandas since the data size I’m analyzing is quite large. I wanted to add a flag column based on several conditions.

But, then I got the following error message. The above code works perfectly when using np.where with pandas dataframe, but didn’t work with dask.array.where .

enter image description here

Advertisement

If numpy works and the operation is row-wise, then one solution is to use .map_partitions :

Dev solutions

Solutions for development problems, dask "column assignment doesn't support type numpy.ndarray".

I’m trying to use Dask instead of pandas since the data size I’m analyzing is quite large. I wanted to add a flag column based on several conditions.

But, then I got the following error message. The above code works perfectly when using np.where with pandas dataframe, but didn’t work with dask.array.where .

enter image description here

>Solution :

If numpy works and the operation is row-wise, then one solution is to use .map_partitions :

Share this:

Leave a reply cancel reply, discover more from dev solutions.

Subscribe now to keep reading and get access to the full archive.

Type your email…

Continue reading

Source code for dask.dataframe.core

  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

Dask from_array converts types to object

I have the following code that creates a dask dataframe from an array. Problem is that all the types are converted to object. I tried to specify the metadata by couldn't find a way. How to specify meta in from_array?

This throws AttributeError: 'list' object has no attribute '_constructor'

ps0604's user avatar

  • The AttributeError is because meta should not be a list. A pandas DataFrame is suggested in the docs as one thing that can be used for meta –  darthbith Commented Jan 24, 2022 at 16:35
  • I was trying to avoid using the empty pandas dataframe, is there any other way? –  ps0604 Commented Jan 24, 2022 at 17:36

2 Answers 2

You could specify numpy array as a structured array:

Alexandra Dudkina's user avatar

  • why do you have to wrap datetime with np.datetime64 ? –  ps0604 Commented Jan 24, 2022 at 17:50
  • Otherwise dtype fore date1 should be object. If that's OK, no wrapping is needed –  Alexandra Dudkina Commented Jan 24, 2022 at 17:56

Look at your b array

hpaulj's user avatar

  • I don't want to specify in meta float, float, float, datetime , how to do that? only with an empty Pandas dataframe? –  ps0604 Commented Jan 24, 2022 at 17:35
  • pandas seems to do better at inferring dtypes if given a list (or lists) rather than an array. –  hpaulj Commented Jan 24, 2022 at 17:42
  • this is not a pandas question, is dask –  ps0604 Commented Jan 24, 2022 at 19:07
  • Using a structured array as proposed in the accepted answer also works with pandas. –  hpaulj Commented Jan 24, 2022 at 19:33

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged python pandas numpy dask or ask your own question .

  • The Overflow Blog
  • Looking under the hood at the tech stack that powers multimodal AI
  • Featured on Meta
  • User activation: Learnings and opportunities
  • Preventing unauthorized automated access to the network
  • What does a new user need in a homepage experience on Stack Overflow?
  • Announcing the new Staging Ground Reviewer Stats Widget

Hot Network Questions

  • Why are Jesus and Satan both referred to as the morning star?
  • Does the collapse axiom predict non-physical states in the case of measurement of continuous-spectrum quantities?
  • Book about supernatural beings running stores in a mall
  • Cheapest / Most efficient way for a human Wizard to not age?
  • Would a material that could absorb 99.5% of light be able to protect someone from Night Vision?
  • Should coffee machines be placed at the region's boundary?
  • How to react to a rejection based on a single one-line negative review?
  • Terminated employee will not help the company locate its truck
  • GeometricScene not working when too many polygons are given
  • How can I connect heavy-gauge wire to a 20A breaker?
  • Consequences of registering a PhD at german university?
  • Is "Canada's nation's capital" a mistake?
  • Why a relay frequently clicks when a battery is low?
  • What are some limitations of this learning method?
  • How uncommon/problematic is a passport whose validity period (period between issue and expiry) is a non-whole number of years?
  • Trinitarian Christianity says Jesus was fully God and Fully man. Did Jesus (the man) know this to be the case?
  • Was the total glaciation of the world, a.k.a. snowball earth, due to Bok space clouds?
  • Frequent Statistics updates in SQL Server 2022 Enterprise Edition
  • Is it ok if I was wearing lip balm and my bow touched my lips by accident and then that part of the bow touched the wood on my viola?
  • Fear of getting injured in Judo
  • Does it ever make sense to have a one-to-one obligatory relationship in a relational database?
  • Can I have multiple guardians of faith?
  • How to assign a definition locally?
  • Why Doesn't the cooling system on a rocket engine burn the fuel?

dask dataframe column assignment doesn't support type numpy.ndarray

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“Column Assignment Doesn't Support Timestamp” #3159

@MaxPowerWasTaken

MaxPowerWasTaken commented Feb 12, 2018

Issue filed here based on MRocklin's suggestion at ...

Reproduceable Example Below:

Versions

'0.17.0'

'0.22.0'

@mrocklin

mrocklin commented Feb 12, 2018

The relevant code is in dask/dataframe/core.py

(pd.DataFrame) def assign(self, **kwargs): for k, v in kwargs.items(): if not (isinstance(v, (Series, Scalar, pd.Series)) or callable(v) or np.isscalar(v)): raise TypeError("Column assignment doesn't support type " "{0}".format(type(v).__name__)) pairs = list(sum(kwargs.items(), ()))

Which was added in

It looks like some sort of check is necessary here, but that the check should be expanded somewhat. Exactly what the check should be is a bit beyond me. That commit has a good example of where this came about. It looks like people were trying to assign lists of data hoping that things would partition out magically. I think that we want the check to be "is either already a dask series or else is something that pandas considers a scalar". I don't know of a good way to check for pandas-scalar values.

Sorry, something went wrong.

@TomAugspurger

TomAugspurger commented Feb 12, 2018

should do it for recent enough (0.18ish?) pandas.

Grand. any interest in submitting a PR? Presumably we just need to replace with the Pandas variant above. We should probably also add a test.

Hi MRocklin, yes I can submit a PR for this, thanks to you and Tom for finding the issue and likely fix so fast. I probably won't have a chance to turn to this before tonight though.

MaxPowerWasTaken commented Feb 12, 2018 • edited Loading

Hey Matt and Tom,

So I tried cloning the dask repo, which seems to be the first step towards contributing per the 'Development Guidelines' site ( ), but got an error related to permissions...

How should I proceed? Sorry if I'm missing something that should be obvious.

@martindurant

martindurant commented Feb 12, 2018

I don't know about your specific error, but typically people use HTTP:

that did it. Thanks!

Maybe that should be the default suggestion at
instead of:

?

...or maybe not. Anyway thanks again Martin. I'll move forward on this tonight.

Yeah, I think to make to work, you have to have set up appropriate SSH keys. I know I did this at some point, but I can't remember how.

@MaxPowerWasTaken

mrocklin commented Feb 17, 2018

I believe that this has been resolved. Merging.

@mrocklin

No branches or pull requests

@mrocklin

IMAGES

  1. DataFrame 数据框与Numpy ndarray 的转换_numpy.ndarray转dataframe-CSDN博客

    dask dataframe column assignment doesn't support type numpy.ndarray

  2. DataFrame 数据框与Numpy ndarray 的转换_numpy.ndarray转dataframe-CSDN博客

    dask dataframe column assignment doesn't support type numpy.ndarray

  3. DataFrame 数据框与Numpy ndarray 的转换_numpy.ndarray转dataframe-CSDN博客

    dask dataframe column assignment doesn't support type numpy.ndarray

  4. 【insert_dataframe】Unsupported column type: . list or tuple is expected

    dask dataframe column assignment doesn't support type numpy.ndarray

  5. python

    dask dataframe column assignment doesn't support type numpy.ndarray

  6. TypeError: can‘t convert np.ndarray of type numpy.object_.解决办法

    dask dataframe column assignment doesn't support type numpy.ndarray

VIDEO

  1. Dask Distributed

  2. Python3 Dask: A Beginner's Guide

  3. How to check that all non-nan values in dataframe column are andgt 0

  4. Converting Pandas DataFrame to Dask DataFrame

  5. Exploring Data in Notebooks

  6. 45- unpivot Dataframe or stack() function in PySpark in Hindi

COMMENTS

  1. DASK: Typerrror: Column assignment doesn't support type numpy.ndarray

    This answer isn't elegant but is functional. I found the select function was about 20 seconds quicker on an 11m row dataset in pandas. I also found that even if I performed the same function in dask that the result would return a numpy (pandas) array.

  2. Dask "Column assignment doesn't support type numpy.ndarray"

    You can use dask.dataframe.Series.where to achieve the same result but without computing. Or better yet, you could make use of the fact that True/False values can be converted directly into 1/0 by simply promoting the type to int (see below).. Both of these options have the advantage of keeping all operations native to dask.dataframe and thereby giving the scheduler more visibility into the ...

  3. TypeError: Column assignment doesn't support type DataFrame ...

    Hi, from looking into the available resources irt to adding a new column to dask dataframe from an array I figured sth like this should work import dask.dataframe as dd import dask.array as da w = dd.from_dask_array(da.from_npy_stack('/h...

  4. create a new column on existing dataframe #1426

    Basically I create a column group in order to make the groupby on consecutive elements. Using a dask data frame instead directly does not work: TypeError: Column assignment doesn't support type ndarray which I can understand. I have tried to create a dask array instead but as my divisions are not representative of the length I don't know how to determine the chunks.

  5. Compatibility with numpy functions

    Direct implementation are direct calls to numpy functions. Element-wise implementations are derived from numpy but applied element-wise: the argument should be a dask array. Dask equivalent are Dask implementations, which may lack or add parameters with respect to the numpy function.

  6. Dask DataFrame unhashable type: 'numpy.ndarray'

    Dask DataFrame unhashable type: 'numpy.ndarray' Dask DataFrame. dask-array. rsohaljr_14 June 2, 2023, 7:18am 1. Hi everyone. ... value_counts is just not compatible with 'numpy.ndarray' columns. Again, the question is what are you trying to achieve with this data? Related Topics Topic Replies Views

  7. dask.dataframe.DataFrame.astype

    DataFrame.astype(dtype) Cast a pandas object to a specified dtype dtype. This docstring was copied from pandas.core.frame.DataFrame.astype. Some inconsistencies with the Dask version may exist. Parameters. dtypestr, data type, Series or Mapping of column name -> data type. Use a str, numpy.dtype, pandas.ExtensionDtype or Python type to cast ...

  8. DataFrame.assign with lambda · Issue #2509 · dask/dask

    Example: import pandas as pd import dask.dataframe as dd In [18]: concrete_df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) In [19]: concrete_df.assign(new_col ...

  9. Dask "Column assignment doesn't support type numpy.ndarray"

    Dask "Column assignment doesn't support type numpy.ndarray" ... bigdata dask dask-dataframe multiple ... 283 Questions keras 211 Questions list 709 Questions loops 176 Questions machine-learning 204 Questions matplotlib 561 Questions numpy 879 Questions opencv 223 Questions pandas 2949 Questions pyspark 157 Questions python 16622 ...

  10. dask.dataframe.DataFrame.assign

    The callable must not change input DataFrame (though pandas doesn't check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned. Returns DataFrame. A new DataFrame with the new columns in addition to all the existing columns. Notes. Assigning multiple columns within the same assign is possible. Later ...

  11. DataFrame.assign doesn't work in dask? Trying to create new column

    You are trying to assign an object of type dask.....DataFrame to a column. A column needs a 2d data structure like a series/list etc. This may be a quirk of how dask does things so you could try explicitly converting your assigned value to a series before assigning it.

  12. Assign a column based on a dask.dataframe.from_array with ...

    import pandas as pd import numpy as np import dask.dataframe as dd from datetime import datetime from datetime import timedelta ## Create dummy data and save to CSV file columns = [] for i in range(0, 87): columns.append(str.format("Column{}", i)) dummy_csv_data = pd.DataFrame(np.random.randint(0,100,size=(605033, 87)), columns=columns) dummy_csv_data.to_csv("testdata.csv") ## Load it and set ...

  13. dask.dataframe.to_numeric

    dask.dataframe.to_numeric ¶. Convert argument to a numeric type. This docstring was copied from pandas.to_numeric. Some inconsistencies with the Dask version may exist. Return type depends on input. Delayed if scalar, otherwise same as input. For errors, only "raise" and "coerce" are allowed. The default return dtype is float64 or ...

  14. Dask "Column assignment doesn't support type numpy.ndarray"

    Dask "Column assignment doesn't support type numpy.ndarray" ... 2022 MR Questions dask-dataframe. Leave a ReplyCancel reply. ... css dart dataframe dictionary django dplyr express flutter frame ggplot2 go html http https java javascript jquery json laravel list mysql numpy oop orm pandas php postgresql python python-3.x r react-hooks reactjs ...

  15. dask.dataframe.core

    These parameters are deprecated in all dask.dataframe reduction methods and will be soon completely disallowed. However, these methods must continue accepting 'out=None' and/or 'dtype=None' indefinitely in order to support numpy dispatchers. For example, ``np.mean (df)`` calls ``df.mean (out=None, dtype=None)``.

  16. "ValueError: Not all divisions are known, can't align ...

    I am trying to perform following on a dask dataframe: newcolumn = np.log(df[column_name]) return df.assign(newcolumn_name=newcolumn) I expected it to work according to example given in documentation. However I'm getting following error: ...

  17. Dask from_array converts types to object

    Creating dask dataframe from array doesn't keep column types. Related. 1. Python : Change dtype of dask array. 10. Dask Array from DataFrame. 2. ... DASK: Typerrror: Column assignment doesn't support type numpy.ndarray whereas Pandas works fine. 1. Dask warning provide an explicit output types. 2.

  18. "Column Assignment Doesn't Support Timestamp" #3159

    # library imports import pandas as pd from sklearn import datasets from dask import dataframe as dd # Load toy data iris = datasets.load_iris() DF = pd.DataFrame(iris.data, columns = iris.feature_names) # Convert Pands DataFrame to Dask DataFrame ddf = dd.from_pandas(DF, npartitions = 2) # Add a date column months_ago = 50 some_date = pd.datetime.today() - pd.DateOffset(months=train_months ...